Skip to content

Initial GPU support #1967

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 51 commits into from
Aug 30, 2024
Merged

Conversation

akshaysubr
Copy link
Contributor

@akshaysubr akshaysubr commented Jun 14, 2024

Adding an initial implementation to support GPU arrays. Currently limited to only arrays that support the __cuda_array_interface__ (cupy, numba, pytorch). Can extend this to supporting dlpack as well later for JAX and Tensorflow support.

TODO:

  • Add unit tests and/or doctests in docstrings
  • Add docstrings and API docs for any new/modified user-facing classes and functions
  • New/modified features documented in docs/tutorial.rst
  • Changes documented in docs/release.rst
  • GitHub Actions have all passed
  • Test coverage is 100% (Codecov passes)

@d-v-b
Copy link
Contributor

d-v-b commented Jun 14, 2024

thanks for working on this @akshaysubr. what would we need to change on the CI side to run gpu tests?

Copy link
Contributor

@rabernat rabernat left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very excited about this! Looks really good! I now understand better why we went to the trouble of creating these different buffer protocols.

Here's some random low-priority feedback based on a first read through.

@akshaysubr
Copy link
Contributor Author

@d-v-b For CI, a GPU runner would be needed and the pipeline should install the optional [gpu] dependencies (just cupy).

@rabernat
Copy link
Contributor

I would have thought that this line

dependency-set: ["minimal", "optional"]

Would have picked up this new optional dependency

gpu = [
"cupy>=13.0.0",
]

But that didn't happen in this CI run: https://github.com./zarr-developers/zarr-python/actions/runs/9555282123/job/26338638922?pr=1967#step:5:1

@akshaysubr akshaysubr force-pushed the gpu-buffer-implementation branch from f67a68e to f1f4029 Compare June 18, 2024 05:45
@akshaysubr akshaysubr marked this pull request as ready for review June 18, 2024 05:47
Copy link
Contributor

@madsbk madsbk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, I only have minor suggestions.

But I wonder if we should move all of this to its own file: gpu_buffer.py ?

@d-v-b
Copy link
Contributor

d-v-b commented Jun 18, 2024

But I wonder if we should move all of this to its own file: gpu_buffer.py ?

I could imagine something like this:

src/zarr/buffer 
|- gpu.py
|   |- Buffer
|- cpu.py
|   |- Buffer

leading to usage like

from zarr.buffer import cpu, gpu
...
cpu.Buffer()
...
gpu.Buffer()

@akshaysubr akshaysubr force-pushed the gpu-buffer-implementation branch from a87cc33 to 864b774 Compare June 28, 2024 06:26
Copy link
Contributor

@madsbk madsbk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, nice work.
I have a minor suggestion and I think the gpu buffer should get its own file like @d-v-b suggest: #1967 (comment)

@jhamman jhamman added the V3 label Jul 1, 2024
@akshaysubr akshaysubr force-pushed the gpu-buffer-implementation branch from f232d79 to 4e18098 Compare July 8, 2024 23:34
@akshaysubr
Copy link
Contributor Author

@d-v-b Thanks for the suggestion to refactor into two separate files. I made those changes and added some more that I think benefit the overall usage of these buffer classes. Essentially now, zarr.buffer.Buffer is an abstract class and zarr.buffer.cpu.Buffer and zarr.buffer.gpu.Buffer are concrete implementations which makes things cleaner. It also requires implementations to be specific about which type of Buffer they intend on using which I think is a good thing.


@classmethod
@abstractmethod
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

test failures are due to this method becoming abstract, while it still gets called in to_buffer_dict in metadata.py. Two things should change to fix this: first, I think maybe the order of the decorators here should be flipped to ensure that Buffer.from_bytes can't be called without an exception, and second metadata.py needs to not be calling any Buffer methods.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure we can flip the order of decorators here based on this snippet from the abstractmethod docs:

When abstractmethod() is applied in combination with other method descriptors, it should be applied as the innermost decorator

I agree though that metadata.py shouldn't be calling any Buffer methods and should instead be calling prototype.buffer methods or default_buffer_prototype.buffer methods. Is there a preference between propagating a prototype argument up the call stack or just using default_buffer_prototype since these to_bytes calls are not in the critical performance path?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Resolved the CI issues by tracking all calls to abstract classmethods of Buffer and NDBuffer. Here are the main ones and the current solution:

  • metadata.py: switch to using default_buffer_prototype.buffer
  • group.py: switch to using default_buffer_prototype.buffer
  • sharding.py: switch to using default_buffer_prototype.buffer
  • codecs/_v2.py: switch to using cpu.Buffer and cpu.NDBuffer since these are all explicitly CPU only.

matrix:
python-version: ['3.10', '3.11', '3.12']
numpy-version: ['1.24', '1.26', '2.0']
dependency-set: ["minimal", "optional"]
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jhamman What test matrix do we want to have for GPU testing? This current config seems a bit excessive? Might be worth cutting this down to the bare minimum to keep CI costs down?

- name: Install Hatch and CuPy
run: |
python -m pip install --upgrade pip
pip install hatch cupy-cuda12x
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Manually adding this here for now. Should make this part of the hatch workflow later.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah there's probably a better way to handle this. Happy to chat when this is ready for discussion

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the only thing remaining for this PR, so any solutions here would be great!

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great, did an initial pass below

We should be able to use the self-referential extra with hatch as well

Comment on lines 24 to 26
python-version: ['3.10', '3.11', '3.12']
numpy-version: ['1.24', '1.26', '2.0']
dependency-set: ["minimal", "optional"]
Copy link
Member

@jhamman jhamman Aug 23, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what if we did something like this for now:

Suggested change
python-version: ['3.10', '3.11', '3.12']
numpy-version: ['1.24', '1.26', '2.0']
dependency-set: ["minimal", "optional"]
python-version: ['3.11']
numpy-version: ['2.0']
dependency-set: ["gpu"]

The gpu dependency set would need to be defined in pyproject.toml

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This would be ideal. There is currently a gpu dependency set in pyproject.toml, but I'm not sure how to get hatch to pick it up.

Copy link
Member

@jakirkham jakirkham left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks Akshay! 🙏

Think it would be worthwhile to put these under cuda12

Idk if we want CUDA 11 at this stage (probably not given this is new and CUDA 11 is getting older). Though do think putting CUDA 12 dependencies under that name will make it easier to upgrade in the future

If we do use this naming, we may want to reflect something similar (cuda for when version doesn't matter and cuda12 where it does)

Comment on lines +77 to +79
gpu = [
"cupy-cuda12x",
]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
gpu = [
"cupy-cuda12x",
]
cuda12 = [
"cupy-cuda12x",
]

pyproject.toml Outdated
]
features = ["test", "extra"]
features = ["test", "extra", "gpu"]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
features = ["test", "extra", "gpu"]
features = ["test", "extra", "cuda12"]

[[tool.hatch.envs.test.matrix]]
python = ["3.10", "3.11", "3.12"]
numpy = ["1.24", "1.26", "2.0"]
features = ["gpu"]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
features = ["gpu"]
features = ["cuda12"]

"ignore:Creating a zarr.buffer.gpu.*:UserWarning",
]
markers = [
"gpu: mark a test as requiring CuPy and GPU"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Make it is worth using cuda here?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're thinking, @jakirkham, that then there could be a multiplicity of these.

Comment on lines 30 to 38
- name: cuda-toolkit
uses: Jimver/[email protected]
id: cuda-toolkit
with:
cuda: '12.5.0'
run: |
echo "Installed cuda version is: ${{steps.cuda-toolkit.outputs.cuda}}"
echo "Cuda install location: ${{steps.cuda-toolkit.outputs.CUDA_PATH}}"
nvcc -V
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm seeing a github action lint error from this section:

[Error](https://github.com./zarr-developers/zarr-python/actions/runs/10569457903/workflow)
a step cannot have both the `uses` and `run` keys

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pushed a change that should hopefully fix this. Moved the run portion to the GPU check step instead.

@jhamman
Copy link
Member

jhamman commented Aug 28, 2024

@akshaysubr - I'm keen to get this in today or tomorrow if possible. Seems like we should just ignore the pre-commit errors once the ci issue noted above is addressed.

@d-v-b d-v-b merged commit 2f9cf22 into zarr-developers:v3 Aug 30, 2024
7 of 9 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: Done
Development

Successfully merging this pull request may close these issues.

7 participants