Measuring memory usage of Zarr array storage operations using memray.
In an ideal world array storage operations would be zero-copy, but many libraries do not achieve this in practice. The scripts here measure what the actual empirical behaviour is across different filesystems (local/cloud), Zarr stores (local/s3fs/obstore), compression settings (using numcodecs), Zarr Python versions (v2/v3), and Zarr formats (2/3).
- 21 April 2025. Zarr Python 3.0.7 was released, which included the fix for zarr-developers/zarr-python#2944
- 8 April 2025. Numcodecs 0.16.0 was released which fixed zarr-developers/numcodecs#717, reducing the number of buffer copies in compressed writes by one.
- 3 April 2025. zarr-developers/zarr-python#2944 was merged, reducing the number of buffer copies for local writes using Zarr v3 by one.
- 6 March 2025. First commit in this repo.
TL;DR: we still need to fix
The workload is simple: create a random 100MB NumPy array and write it to Zarr storage in a single chunk. Then (in a separate process) read it back from storage into a new NumPy array.
- Writes with no compression incur a single buffer copy, except for Zarr v2 writing to the local filesystem. (This shows that zero copy is possible, at least.)
- Writes with compression incur a second buffer copy, since implementations first write the compressed bytes into another buffer, which has to be around the size of the uncompressed bytes (since it is not known in advance how compressible the original is).
- Reads with no compression incur a single copy from local files, but two copies from S3 (except for obstore, which has a single copy). This seems to be because the S3 libraries read lots of small blocks then join them into a larger one, whereas local files can be read in one go into a single buffer.
- Reads with compression incur two buffer copies, except for Zarr v2 reading from the local filesystem.
It would seem there is scope to reduce the number of copies in some of these cases.
Number of extra copies needed to write an array to storage using Zarr. (Links are to memray flamegraphs. Bold indicates best achievable.)
Filesystem | Store | Zarr Python version | Zarr format | Uncompressed | Compressed |
---|---|---|---|---|---|
Local | local | v2 (2.18.5) | 2 | 0 | 2 |
v3 (3.0.6) | 3 | 1 | 2 | ||
v3 (dev1) | 3 | 0 | 1 | ||
obstore | v3 (dev1) | 3 | 1 | 2 | |
S3 | s3fs | v2 (2.18.5) | 2 | 1 | 2 |
v3 (3.0.6) | 3 | 1 | 2 | ||
obstore | v3 (3.0.6) | 3 | 1 | 2 |
(1) Zarr v3 (dev) includes zarr-developers/zarr-python#2944 and zarr-developers/numcodecs#656
Number of extra copies needed to read an array from storage using Zarr. (Links are to memray flamegraphs. Bold indicates best achievable.)
Filesystem | Store | Zarr Python version | Zarr format | Uncompressed | Compressed |
---|---|---|---|---|---|
Local | local | v2 (2.18.5) | 2 | 1 | 1 |
v3 (3.0.6) | 3 | 1 | 2 | ||
obstore | v3 (3.0.6) | 3 | 1 | 2 | |
S3 | s3fs | v2 (2.18.5) | 2 | 2 | 2 |
v3 (3.0.6) | 3 | 2 | 2 | ||
obstore | v3 (3.0.6) | 3 | 1 | 2 |
This delves into what is happening for the different code paths, and suggests some remedies to reduce the number of buffer copies.
-
Local uncompressed writes (v2 only) - actual copies 0, desired copies 0
- This is the only zero-copy case. The numpy array is passed directly to the file's
write()
method (inDirectoryStore
), and since arrays implement the buffer protocol, no copy is made.
- This is the only zero-copy case. The numpy array is passed directly to the file's
-
S3 uncompressed writes (v2 only) - actual copies 1, desired copies 0
- A copy of the numpy array is made by this code in fsspec (in
maybe_convert
, called fromFSMap.setitems()
):bytes(memoryview(value))
. - Remedy: it might be possible to use the memory view in fsspec and avoid the copy (see fsspec/s3fs#959), but it's probably better to focus on improvements to v3 (see below)
- A copy of the numpy array is made by this code in fsspec (in
-
Uncompressed writes (v3 only) - actual copies 1, desired copies 0
- A copy of the numpy array is made by this code (in Zarr's
LocalStore
):memoryview(value.as_numpy_array().tobytes())
. A similar thing happens inFsspecStore
and obstore. - Remedy: this could be fixed with zarr-developers/zarr-python#2925, so the
value
Buffer
is exposed via the buffer protocol without making a copy.
- A copy of the numpy array is made by this code (in Zarr's
-
Compressed writes - actual copies 2, desired copies 1
- It is surprising that there are two copies, not one, given that the uncompressed case has zero copies (for local v2, at least). What's happening is that the numcodecs blosc compressor is making an extra copy when it resizes the compressed buffer. A similar thing happens for lz4 and zstd.
- Remedy: the issue is tracked in numcodecs in zarr-developers/numcodecs#717.
-
Local reads (v2 only) - actual copies 1, desired copies 0
- The Zarr Python v2 read pipeline separates reading the bytes from storage, and filling the output array - see
_process_chunk()
. So there is necessarily a buffer copy, since the bytes are never read directly into the output array. - Remedy: Zarr Python v2 is in bugfix mode now so there is no point in trying to change it to make fewer buffer copies. The changes would be quite invasive anyway.
- The Zarr Python v2 read pipeline separates reading the bytes from storage, and filling the output array - see
-
Local reads (v3 only), plus obstore local and S3 - actual copies 1 (2 for compressed), desired copies 0 (1 for compressed)
- The Zarr Python v3
CodecPipeline
has aread()
method that separates reading the bytes from storage, and filling the output array (just like v2). TheByteGetter
class has no way of reading directly into an output array. - Remedy: this could be fixed by zarr-developers/zarr-python#2904, but it is potentially a major change to Zarr's internals
- The Zarr Python v3
-
S3 reads (s3fs only) - actual copies 2, desired copies 0
- Both the Python asyncio SSL library and aiohttp introduce a buffer copy when reading from S3 (using s3fs).
- Remedy: unclear
- [cubed] Improve memory model by explicitly modelling buffer copies - cubed-dev/cubed#701 (fixed)
- [zarr-python] Codec pipeline memory usage - zarr-developers/zarr-python#2904
- [zarr-python] Add
Buffer.as_buffer_like
method - zarr-developers/zarr-python#2925 - [zarr-python] Avoid memory copy in local store write - zarr-developers/zarr-python#2944 (fixed)
- [zarr-python] Avoid memory copy in obstore write - zarr-developers/zarr-python#2972
- [numcodecs] Extra memory copies in blosc, lz4, and zstd compress functions - zarr-developers/numcodecs#717 (fixed)
- [numcodecs] Switch
Buffer
s tomemoryview
s - zarr-developers/numcodecs#656 (fixed) - [s3fs] Using the Python buffer protocol in
pipe
- fsspec/s3fs#959
Create a new virtual env (for Python 3.11), then run
pip install -r requirements.txt
pip install -U 'zarr<3' 'numcodecs<0.16.0'
python memray-array.py write
python memray-array.py write --no-compress
python memray-array.py read
python memray-array.py read --no-compress
pip install -U 'zarr>3' 'numcodecs<0.16.0'
python memray-array.py write
python memray-array.py write --no-compress
python memray-array.py read
python memray-array.py read --no-compress
pip install -U 'git+https://github.com./zarr-developers/zarr-python#egg=zarr' 'numcodecs>=0.16.0'
python memray-array.py write --library obstore
python memray-array.py write --no-compress --library obstore
python memray-array.py read --library obstore
python memray-array.py read --no-compress --library obstore
pip install -U 'git+https://github.com./tomwhite/zarr-python@memray-array-testing#egg=zarr' 'numcodecs>=0.16.0'
python memray-array.py write
python memray-array.py write --no-compress
These can take a while to run (unless run from within AWS).
Note: change the URL to an S3 bucket you own and have already created.
pip install -U 'zarr<3' 'numcodecs<0.16.0'
python memray-array.py write --store-prefix=s3://cubed-unittest/mem-array
python memray-array.py write --no-compress --store-prefix=s3://cubed-unittest/mem-array
python memray-array.py read --store-prefix=s3://cubed-unittest/mem-array
python memray-array.py read --no-compress --store-prefix=s3://cubed-unittest/mem-array
pip install -U 'zarr>3' 'numcodecs<0.16.0'
python memray-array.py write --store-prefix=s3://cubed-unittest/mem-array
python memray-array.py write --no-compress --store-prefix=s3://cubed-unittest/mem-array
python memray-array.py read --store-prefix=s3://cubed-unittest/mem-array
python memray-array.py read --no-compress --store-prefix=s3://cubed-unittest/mem-array
pip install -U 'git+https://github.com./zarr-developers/zarr-python#egg=zarr' 'numcodecs<0.16.0'
export AWS_DEFAULT_REGION=...
export AWS_ACCESS_KEY_ID=...
export AWS_SECRET_ACCESS_KEY=...
python memray-array.py write --library obstore --store-prefix=s3://cubed-unittest/mem-array
python memray-array.py write --no-compress --library obstore --store-prefix=s3://cubed-unittest/mem-array
python memray-array.py read --library obstore --store-prefix=s3://cubed-unittest/mem-array
python memray-array.py read --no-compress --library obstore --store-prefix=s3://cubed-unittest/mem-array
mkdir -p flamegraphs
(cd profiles; for f in $(ls *.bin); do echo $f; python -m memray flamegraph --temporal -f -o ../flamegraphs/$f.html $f; done)
Or just run make
.