-
Notifications
You must be signed in to change notification settings - Fork 527
Bump torchao + add unit tests for torchao kernels #9396
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/9396
Note: Links to docs will display an error until the docs builds have been completed. ❌ 1 New Failure, 1 Cancelled Job, 2 PendingAs of commit ad95321 with merge base 7a2a300 ( NEW FAILURE - The following job has failed:
CANCELLED JOB - The following job was cancelled. Please retry:
This comment was automatically generated by Dr. CI and updates every 15 minutes. |
3f4b66d
to
531d8ca
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks 🙏🏻
-d fp32 | ||
|
||
# Test run | ||
./cmake-out/examples/models/llama/llama_main --model_path=$MODEL_OUT --tokenizer_path=$TOKENIZER --prompt="Once upon a time," |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we do a simple non-brittle sanity check? e.g. output length > 0 or something
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the main concern is whether it runs. If it runs, it will produce output.
examples/models/llama/README.md
Outdated
@@ -380,6 +380,79 @@ Please refer to [this tutorial](https://pytorch.org/executorch/main/llm/llama-de | |||
### Android | |||
Please refer to [this tutorial](https://pytorch.org/executorch/main/llm/llama-demo-android.html) to for full instructions on building the Android LLAMA Demo App. | |||
|
|||
## Running with low-bit kernels | |||
|
|||
We now give instructions for quantizating and running your model with low-bit kernels. These are still experimental, and require you do development on an Arm-based Mac. Also note that low-bit quantization often requires QAT (quantization-aware training) to give good quality results. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Might be worth saying that these don't work with dynamic shapes yet
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added
### Summary This PR bumps the torchao pin and adds unit tests and documentation for the lowbit torchao kernels. ### Test plan New CI test
### Summary This PR bumps the torchao pin and adds unit tests and documentation for the lowbit torchao kernels. ### Test plan New CI test
Summary
This PR bumps the torchao pin and adds unit tests and documentation for the lowbit torchao kernels.
Test plan
New CI test