Performance considerations for Dataset.iter #7511
Unanswered
wittenator
asked this question in
Q&A
Replies: 1 comment
-
Just for reference: It seems that using the JAX option for the loader interferes with the async dispatch of Jax. Using the numpy loader resulted in almost twice the iteration speed and much less wait time for the GPU. |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
Other frameworks such as Pytorch let you specify the number of workers for a Dataloader in order to preload batches and keep the GPU utilization high. Are there any experiences with how this works with Huggingface Datasets and the iter method? I am currently using Huggingface Datasets with Jax output and see alternating GPU utilization and a lot of time is spent accessing memory even if I load the dataset completely into memory. I thought that data loading may be one issue at play here. There is a similar issue from two years ago: #6341 , but I am curious if something changed since then.
Beta Was this translation helpful? Give feedback.
All reactions