Initial GPU support #1967

akshaysubr · 2024-06-14T03:25:29Z

Adding an initial implementation to support GPU arrays. Currently limited to only arrays that support the __cuda_array_interface__ (cupy, numba, pytorch). Can extend this to supporting dlpack as well later for JAX and Tensorflow support.

TODO:

Add unit tests and/or doctests in docstrings
Add docstrings and API docs for any new/modified user-facing classes and functions
New/modified features documented in docs/tutorial.rst
Changes documented in docs/release.rst
GitHub Actions have all passed
Test coverage is 100% (Codecov passes)

d-v-b · 2024-06-14T07:34:49Z

thanks for working on this @akshaysubr. what would we need to change on the CI side to run gpu tests?

rabernat

Very excited about this! Looks really good! I now understand better why we went to the trouble of creating these different buffer protocols.

Here's some random low-priority feedback based on a first read through.

src/zarr/buffer.py

tests/v3/test_store/test_memory.py

akshaysubr · 2024-06-17T21:58:02Z

@d-v-b For CI, a GPU runner would be needed and the pipeline should install the optional [gpu] dependencies (just cupy).

rabernat · 2024-06-17T22:12:09Z

I would have thought that this line

zarr-python/.github/workflows/test.yml

Line 26 in 143faea

dependency-set: ["minimal", "optional"]

Would have picked up this new optional dependency

zarr-python/pyproject.toml

Lines 63 to 65 in fe8591a

    
           gpu = [ 
        
               "cupy>=13.0.0", 
        
           ]

But that didn't happen in this CI run: https://github.com./zarr-developers/zarr-python/actions/runs/9555282123/job/26338638922?pr=1967#step:5:1

madsbk

Looks good, I only have minor suggestions.

But I wonder if we should move all of this to its own file: gpu_buffer.py ?

src/zarr/buffer.py

src/zarr/store/memory.py

d-v-b · 2024-06-18T08:53:24Z

But I wonder if we should move all of this to its own file: gpu_buffer.py ?

I could imagine something like this:

src/zarr/buffer 
|- gpu.py
|   |- Buffer
|- cpu.py
|   |- Buffer

leading to usage like

from zarr.buffer import cpu, gpu
...
cpu.Buffer()
...
gpu.Buffer()

madsbk

Looks good, nice work.
I have a minor suggestion and I think the gpu buffer should get its own file like @d-v-b suggest: #1967 (comment)

src/zarr/store/memory.py

…fer_prototype appropriately

…ementations of those

akshaysubr · 2024-07-08T23:56:16Z

@d-v-b Thanks for the suggestion to refactor into two separate files. I made those changes and added some more that I think benefit the overall usage of these buffer classes. Essentially now, zarr.buffer.Buffer is an abstract class and zarr.buffer.cpu.Buffer and zarr.buffer.gpu.Buffer are concrete implementations which makes things cleaner. It also requires implementations to be specific about which type of Buffer they intend on using which I think is a good thing.

d-v-b · 2024-07-12T16:11:42Z

src/zarr/buffer/core.py


    @classmethod
+    @abstractmethod


test failures are due to this method becoming abstract, while it still gets called in to_buffer_dict in metadata.py. Two things should change to fix this: first, I think maybe the order of the decorators here should be flipped to ensure that Buffer.from_bytes can't be called without an exception, and second metadata.py needs to not be calling any Buffer methods.

I'm not sure we can flip the order of decorators here based on this snippet from the abstractmethod docs:

When abstractmethod() is applied in combination with other method descriptors, it should be applied as the innermost decorator

I agree though that metadata.py shouldn't be calling any Buffer methods and should instead be calling prototype.buffer methods or default_buffer_prototype.buffer methods. Is there a preference between propagating a prototype argument up the call stack or just using default_buffer_prototype since these to_bytes calls are not in the critical performance path?

Resolved the CI issues by tracking all calls to abstract classmethods of Buffer and NDBuffer. Here are the main ones and the current solution:

metadata.py: switch to using default_buffer_prototype.buffer

group.py: switch to using default_buffer_prototype.buffer

sharding.py: switch to using default_buffer_prototype.buffer

codecs/_v2.py: switch to using cpu.Buffer and cpu.NDBuffer since these are all explicitly CPU only.

…up.py

akshaysubr · 2024-08-23T06:34:20Z

.github/workflows/gpu_test.yml

+      matrix:
+        python-version: ['3.10', '3.11', '3.12']
+        numpy-version: ['1.24', '1.26', '2.0']
+        dependency-set: ["minimal", "optional"]


@jhamman What test matrix do we want to have for GPU testing? This current config seems a bit excessive? Might be worth cutting this down to the bare minimum to keep CI costs down?

akshaysubr · 2024-08-23T22:59:08Z

.github/workflows/gpu_test.yml

+    - name: Install Hatch and CuPy
+      run: |
+        python -m pip install --upgrade pip 
+        pip install hatch cupy-cuda12x


Manually adding this here for now. Should make this part of the hatch workflow later.

Yeah there's probably a better way to handle this. Happy to chat when this is ready for discussion

This is the only thing remaining for this PR, so any solutions here would be great!

Great, did an initial pass below

We should be able to use the self-referential extra with hatch as well

jhamman · 2024-08-23T23:21:24Z

.github/workflows/gpu_test.yml

+        python-version: ['3.10', '3.11', '3.12']
+        numpy-version: ['1.24', '1.26', '2.0']
+        dependency-set: ["minimal", "optional"]


what if we did something like this for now:

Suggested change

python-version: ['3.10', '3.11', '3.12']

numpy-version: ['1.24', '1.26', '2.0']

dependency-set: ["minimal", "optional"]

python-version: ['3.11']

numpy-version: ['2.0']

dependency-set: ["gpu"]

The gpu dependency set would need to be defined in pyproject.toml

This would be ideal. There is currently a gpu dependency set in pyproject.toml, but I'm not sure how to get hatch to pick it up.

…hatch dependency list

jakirkham

Thanks Akshay! 🙏

Think it would be worthwhile to put these under cuda12

Idk if we want CUDA 11 at this stage (probably not given this is new and CUDA 11 is getting older). Though do think putting CUDA 12 dependencies under that name will make it easier to upgrade in the future

If we do use this naming, we may want to reflect something similar (cuda for when version doesn't matter and cuda12 where it does)

jakirkham · 2024-08-24T00:02:08Z

pyproject.toml

+gpu = [
+    "cupy-cuda12x",
+]


Suggested change

gpu = [

"cupy-cuda12x",

]

cuda12 = [

"cupy-cuda12x",

]

jakirkham · 2024-08-24T00:02:23Z

pyproject.toml

 ]
-features = ["test", "extra"]
+features = ["test", "extra", "gpu"]


Suggested change

features = ["test", "extra", "gpu"]

features = ["test", "extra", "cuda12"]

jakirkham · 2024-08-24T00:02:31Z

pyproject.toml

+[[tool.hatch.envs.test.matrix]]
+python = ["3.10", "3.11", "3.12"]
+numpy = ["1.24", "1.26", "2.0"]
+features = ["gpu"]


Suggested change

features = ["gpu"]

features = ["cuda12"]

jakirkham · 2024-08-24T00:05:18Z

pyproject.toml

+    "ignore:Creating a zarr.buffer.gpu.*:UserWarning",
+]
+markers = [
+    "gpu: mark a test as requiring CuPy and GPU"


Make it is worth using cuda here?

You're thinking, @jakirkham, that then there could be a multiplicity of these.

jhamman · 2024-08-28T15:13:58Z

.github/workflows/gpu_test.yml

+    - name: cuda-toolkit
+      uses: Jimver/[email protected]
+      id: cuda-toolkit
+      with:
+        cuda: '12.5.0'
+      run: |
+        echo "Installed cuda version is: ${{steps.cuda-toolkit.outputs.cuda}}"
+        echo "Cuda install location: ${{steps.cuda-toolkit.outputs.CUDA_PATH}}"
+        nvcc -V


I'm seeing a github action lint error from this section:

[Error](https://github.com./zarr-developers/zarr-python/actions/runs/10569457903/workflow) a step cannot have both the `uses` and `run` keys

Pushed a change that should hopefully fix this. Moved the run portion to the GPU check step instead.

jhamman · 2024-08-28T15:16:01Z

@akshaysubr - I'm keen to get this in today or tomorrow if possible. Seems like we should just ignore the pre-commit errors once the ci issue noted above is addressed.

rabernat reviewed Jun 14, 2024

View reviewed changes

src/zarr/buffer.py Outdated Show resolved Hide resolved

src/zarr/buffer.py Outdated Show resolved Hide resolved

src/zarr/buffer.py Outdated Show resolved Hide resolved

tests/v3/test_store/test_memory.py Show resolved Hide resolved

akshaysubr force-pushed the gpu-buffer-implementation branch from f67a68e to f1f4029 Compare June 18, 2024 05:45

akshaysubr marked this pull request as ready for review June 18, 2024 05:47

madsbk suggested changes Jun 18, 2024

View reviewed changes

akshaysubr force-pushed the gpu-buffer-implementation branch from a87cc33 to 864b774 Compare June 28, 2024 06:26

madsbk approved these changes Jun 28, 2024

View reviewed changes

src/zarr/store/memory.py Outdated Show resolved Hide resolved

jhamman added the V3 label Jul 1, 2024

jhamman mentioned this pull request Jul 3, 2024

[v3] Release 3.0.0.alpha.1 #2008

Closed

akshaysubr added 13 commits July 8, 2024 16:34

Initial implementation of a GPU version of Buffer and NDBuffer

22a5807

Adding cupy as an optional dependency

d8cc79f

Adding GPU prototype test

4d2b8c7

Adding GPU memory store implementation

36b1cb2

Addressing comments

04001b4

Making GpuMemoryStore tests conditional on cupy being available

74a13c4

Adding test checking that existing host memory codecs use the gpu_buf…

bdc0a24

…fer_prototype appropriately

Reducing code and docs duplication

d900aa3

Formatting

0eca795

Fixing silent rebase conflicts

d9ed6c4

Reducing code duplication in GpuMemoryStore

5405e38

Refactoring to an abstract Buffer class and concrete CPU and GPU impl…

2858701

…ementations of those

Templating store tests on Buffer type

4e18098

akshaysubr force-pushed the gpu-buffer-implementation branch from f232d79 to 4e18098 Compare July 8, 2024 23:34

Changing imports to prevent circular dependencies

35948d4

d-v-b reviewed Jul 12, 2024

View reviewed changes

Fixing unsafe calls to Buffer abstract methods in metadata.py and gro…

bd2a20b

…up.py

akshaysubr commented Aug 23, 2024

View reviewed changes

akshaysubr added 6 commits August 23, 2024 13:48

Fixing store test errors

b559ee4

Fixing stateful store test

26a74f4

Fixing config test

7307833

Fixing group tests

f6fddd9

Fixing indexing tests

2b1fe14

Manually installing cupy in the GPU workflow

abd135f

akshaysubr commented Aug 23, 2024

View reviewed changes

jhamman reviewed Aug 23, 2024

View reviewed changes

Ablating GPU test matrix and adding gpu optional dependencies to the …

1db58e7

…hatch dependency list

jakirkham reviewed Aug 24, 2024

View reviewed changes

akshaysubr added 4 commits August 26, 2024 11:29

Adding some more logging to debug GPU test failures

296bd02

Adding GA step to install the CUDA toolkit

b33c887

Merging with v3

c894f60

Adding a separate gputest hatch environment to simplify GPU testing

e0da0fb

jhamman reviewed Aug 28, 2024

View reviewed changes

akshaysubr added 7 commits August 28, 2024 13:51

Fixing error in cuda-toolkit step

07277af

Downgrading to CUDA 12.4.1 in cuda-toolkit GA

6e49e85

Trying manual install of the CUDA toolkit

02c319c

Updating environment variables with CUDA installation

e82ddc1

Removing PATH env and setting it only through GITHUB_PATH

7854ce9

Merge branch 'v3' into gpu-buffer-implementation

9688ad6

Fixing issue from merge conflict

3852c9f

jhamman approved these changes Aug 30, 2024

View reviewed changes

Merge branch 'v3' into gpu-buffer-implementation

2e8069c

d-v-b merged commit 2f9cf22 into zarr-developers:v3 Aug 30, 2024
7 of 9 checks passed

ilan-gold mentioned this pull request Sep 17, 2024

array_api usage for Buffer #2199

Open

nenb mentioned this pull request Jan 6, 2025

Device support in zarr-python (especially for GPU) #2658

Open

QuLogic mentioned this pull request Jan 20, 2025

Buffer uses signed bytes with v2 compressors #2735

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Initial GPU support #1967

Initial GPU support #1967

akshaysubr commented Jun 14, 2024 •

edited

Loading

d-v-b commented Jun 14, 2024

rabernat left a comment

akshaysubr commented Jun 17, 2024

rabernat commented Jun 17, 2024

madsbk left a comment

d-v-b commented Jun 18, 2024

madsbk left a comment

akshaysubr commented Jul 8, 2024

d-v-b Jul 12, 2024

akshaysubr Jul 13, 2024

akshaysubr Jul 15, 2024

akshaysubr Aug 23, 2024

akshaysubr Aug 23, 2024

jakirkham Aug 23, 2024

akshaysubr Aug 24, 2024

jakirkham Aug 24, 2024

jhamman Aug 23, 2024 •

edited

Loading

akshaysubr Aug 23, 2024

jakirkham left a comment

jakirkham Aug 24, 2024

jakirkham Aug 24, 2024

jakirkham Aug 24, 2024

jakirkham Aug 24, 2024

joshmoore Aug 30, 2024

jhamman Aug 28, 2024

akshaysubr Aug 28, 2024

jhamman commented Aug 28, 2024

	features = ["test", "extra", "gpu"]
	features = ["test", "extra", "cuda12"]

Initial GPU support #1967

Initial GPU support #1967

Conversation

akshaysubr commented Jun 14, 2024 • edited Loading

d-v-b commented Jun 14, 2024

rabernat left a comment

Choose a reason for hiding this comment

akshaysubr commented Jun 17, 2024

rabernat commented Jun 17, 2024

madsbk left a comment

Choose a reason for hiding this comment

d-v-b commented Jun 18, 2024

madsbk left a comment

Choose a reason for hiding this comment

akshaysubr commented Jul 8, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jhamman Aug 23, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jakirkham left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jhamman commented Aug 28, 2024

akshaysubr commented Jun 14, 2024 •

edited

Loading

jhamman Aug 23, 2024 •

edited

Loading