Skip to content

Measuring memory usage of array storage operations using memray

Notifications You must be signed in to change notification settings

tomwhite/memray-array

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

19 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

memray-array

Measuring memory usage of Zarr array storage operations using memray.

In an ideal world array storage operations would be zero-copy, but many libraries do not achieve this in practice. The scripts here measure what the actual empirical behaviour is across different filesystems (local/cloud), Zarr stores (local/s3fs/obstore), compression settings (using numcodecs), Zarr Python versions (v2/v3), and Zarr formats (2/3).

Updates

TL;DR: we still need to fix

Summary

The workload is simple: create a random 100MB NumPy array and write it to Zarr storage in a single chunk. Then (in a separate process) read it back from storage into a new NumPy array.

  • Writes with no compression incur a single buffer copy, except for Zarr v2 writing to the local filesystem. (This shows that zero copy is possible, at least.)
  • Writes with compression incur a second buffer copy, since implementations first write the compressed bytes into another buffer, which has to be around the size of the uncompressed bytes (since it is not known in advance how compressible the original is).
  • Reads with no compression incur a single copy from local files, but two copies from S3 (except for obstore, which has a single copy). This seems to be because the S3 libraries read lots of small blocks then join them into a larger one, whereas local files can be read in one go into a single buffer.
  • Reads with compression incur two buffer copies, except for Zarr v2 reading from the local filesystem.

It would seem there is scope to reduce the number of copies in some of these cases.

Writes

Number of extra copies needed to write an array to storage using Zarr. (Links are to memray flamegraphs. Bold indicates best achievable.)

Filesystem Store Zarr Python version Zarr format Uncompressed Compressed
Local local v2 (2.18.5) 2 0 2
v3 (3.0.6) 3 1 2
v3 (dev1) 3 0 1
obstore v3 (dev1) 3 1 2
S3 s3fs v2 (2.18.5) 2 1 2
v3 (3.0.6) 3 1 2
obstore v3 (3.0.6) 3 1 2

(1) Zarr v3 (dev) includes zarr-developers/zarr-python#2944 and zarr-developers/numcodecs#656

Reads

Number of extra copies needed to read an array from storage using Zarr. (Links are to memray flamegraphs. Bold indicates best achievable.)

Filesystem Store Zarr Python version Zarr format Uncompressed Compressed
Local local v2 (2.18.5) 2 1 1
v3 (3.0.6) 3 1 2
obstore v3 (3.0.6) 3 1 2
S3 s3fs v2 (2.18.5) 2 2 2
v3 (3.0.6) 3 2 2
obstore v3 (3.0.6) 3 1 2

Discussion

This delves into what is happening for the different code paths, and suggests some remedies to reduce the number of buffer copies.

Writes

  • Local uncompressed writes (v2 only) - actual copies 0, desired copies 0

    • This is the only zero-copy case. The numpy array is passed directly to the file's write() method (in DirectoryStore), and since arrays implement the buffer protocol, no copy is made.
  • S3 uncompressed writes (v2 only) - actual copies 1, desired copies 0

    • A copy of the numpy array is made by this code in fsspec (in maybe_convert, called from FSMap.setitems()): bytes(memoryview(value)).
    • Remedy: it might be possible to use the memory view in fsspec and avoid the copy (see fsspec/s3fs#959), but it's probably better to focus on improvements to v3 (see below)
  • Uncompressed writes (v3 only) - actual copies 1, desired copies 0

  • Compressed writes - actual copies 2, desired copies 1

    • It is surprising that there are two copies, not one, given that the uncompressed case has zero copies (for local v2, at least). What's happening is that the numcodecs blosc compressor is making an extra copy when it resizes the compressed buffer. A similar thing happens for lz4 and zstd.
    • Remedy: the issue is tracked in numcodecs in zarr-developers/numcodecs#717.

Reads

  • Local reads (v2 only) - actual copies 1, desired copies 0

    • The Zarr Python v2 read pipeline separates reading the bytes from storage, and filling the output array - see _process_chunk(). So there is necessarily a buffer copy, since the bytes are never read directly into the output array.
    • Remedy: Zarr Python v2 is in bugfix mode now so there is no point in trying to change it to make fewer buffer copies. The changes would be quite invasive anyway.
  • Local reads (v3 only), plus obstore local and S3 - actual copies 1 (2 for compressed), desired copies 0 (1 for compressed)

    • The Zarr Python v3 CodecPipeline has a read() method that separates reading the bytes from storage, and filling the output array (just like v2). The ByteGetter class has no way of reading directly into an output array.
    • Remedy: this could be fixed by zarr-developers/zarr-python#2904, but it is potentially a major change to Zarr's internals
  • S3 reads (s3fs only) - actual copies 2, desired copies 0

    • Both the Python asyncio SSL library and aiohttp introduce a buffer copy when reading from S3 (using s3fs).
    • Remedy: unclear

Related issues

How to run

Create a new virtual env (for Python 3.11), then run

pip install -r requirements.txt

Local

pip install -U 'zarr<3' 'numcodecs<0.16.0'
python memray-array.py write
python memray-array.py write --no-compress
python memray-array.py read
python memray-array.py read --no-compress

pip install -U 'zarr>3' 'numcodecs<0.16.0'
python memray-array.py write
python memray-array.py write --no-compress
python memray-array.py read
python memray-array.py read --no-compress

pip install -U 'git+https://github.com./zarr-developers/zarr-python#egg=zarr' 'numcodecs>=0.16.0'
python memray-array.py write --library obstore
python memray-array.py write --no-compress --library obstore
python memray-array.py read --library obstore
python memray-array.py read --no-compress --library obstore

pip install -U 'git+https://github.com./tomwhite/zarr-python@memray-array-testing#egg=zarr' 'numcodecs>=0.16.0'
python memray-array.py write
python memray-array.py write --no-compress

S3

These can take a while to run (unless run from within AWS).

Note: change the URL to an S3 bucket you own and have already created.

pip install -U 'zarr<3' 'numcodecs<0.16.0'
python memray-array.py write --store-prefix=s3://cubed-unittest/mem-array
python memray-array.py write --no-compress --store-prefix=s3://cubed-unittest/mem-array
python memray-array.py read --store-prefix=s3://cubed-unittest/mem-array
python memray-array.py read --no-compress --store-prefix=s3://cubed-unittest/mem-array

pip install -U 'zarr>3' 'numcodecs<0.16.0'
python memray-array.py write --store-prefix=s3://cubed-unittest/mem-array
python memray-array.py write --no-compress --store-prefix=s3://cubed-unittest/mem-array
python memray-array.py read --store-prefix=s3://cubed-unittest/mem-array
python memray-array.py read --no-compress --store-prefix=s3://cubed-unittest/mem-array

pip install -U 'git+https://github.com./zarr-developers/zarr-python#egg=zarr' 'numcodecs<0.16.0'
export AWS_DEFAULT_REGION=...
export AWS_ACCESS_KEY_ID=...
export AWS_SECRET_ACCESS_KEY=...
python memray-array.py write --library obstore --store-prefix=s3://cubed-unittest/mem-array
python memray-array.py write --no-compress --library obstore --store-prefix=s3://cubed-unittest/mem-array
python memray-array.py read --library obstore --store-prefix=s3://cubed-unittest/mem-array
python memray-array.py read --no-compress --library obstore --store-prefix=s3://cubed-unittest/mem-array

Memray flamegraphs

mkdir -p flamegraphs
(cd profiles; for f in $(ls *.bin); do echo $f; python -m memray flamegraph --temporal -f -o ../flamegraphs/$f.html $f; done)

Or just run make.

About

Measuring memory usage of array storage operations using memray

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages