-
-
Notifications
You must be signed in to change notification settings - Fork 325
Avoid memory copy in obstore write #2972
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Avoid memory copy in obstore write #2972
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks Tom! 🙏
Had a question below
src/zarr/storage/_obstore.py
Outdated
@@ -145,15 +147,15 @@ async def set(self, key: str, value: Buffer) -> None: | |||
|
|||
self._check_writable() | |||
|
|||
buf = value.to_bytes() | |||
buf = value.as_numpy_array().view(np.uint8) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it worthwhile to make this a standard method? This would make it easier to use throughout the codebase
Some possible names:
to_bytelike
to_binary
to_uint8s
- ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i guess i'm also curious to know why we have a to_bytes()
method that makes a copy, when .as_numpy_array().view(np.uint8)
achieves the same effective result (an iterable of bytes) without a copy.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This was suggested in #2925, so I've added an as_bytes_like
method.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
bytes
objects always own their memory. So creating a new bytes
object means making a copy
If we prefer to move away from this approach, we could add a deprecation cycle to move from to_bytes
to as_bytes_like
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think FsspecStore
works with BytesLike
since it didn't accept a memoryview
object when I tried it. So we probably need to keep to_bytes()
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Think this would be a good issue to raise with fsspec
if we can reproduce with it directly
If not, a Zarr issue would be welcome
Ideally these copies should be avoidable in the FsspecStore
case as well
Edit: Ofc this is non-blocking, just would like to improve the performance in all cases
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Opened fsspec/s3fs#959
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some possible names:
I'd also suggest as_buffer_like
, because in my head "bytes" in Python is a type that always owns its own memory, while "buffer" doesn't necessarily own its memory, as in the buffer protocol or collections.abc.Buffer
.
This removes an unnecessary memory copy when writing using obstore, similar to #2944
TODO:
docs/user-guide/*.rst
changes/