Don't Serialize Scales/ZP in Flatbuffer #9029
Labels
good first issue
Good for newcomers
module: xnnpack
Issues related to xnnpack delegation and the code under backends/xnnpack/
triaged
This issue has been looked at a team member, and triaged and prioritized into an appropriate module
Milestone
Other forms of data like weights/bias are serialized separately (either data store or end of flatbuffer segment), Scales and ZP however are serialized straight into Flatbuffer. This was ok when we were doing per-tensor and per-channel quantization because the number of scales was not large, but now with blockwise quantization the number of scales can be large. Realisitically since this is a form of data, we should put this in the same place weights/bias's are stored
Essentially we want to move data serialization of scales zp from this:
executorch/backends/xnnpack/operators/node_visitor.py
Line 278 in 0c6a71b
to something like this:
executorch/backends/xnnpack/operators/node_visitor.py
Line 496 in 0c6a71b
This is only something we should try to do with zeropoints/ scales that are tensors or lists. for per_tensor quantization with a single zp/scale, it becomes overkill to serialize the scales/zp separately, so we should leave those alone.
cc @digantdesai @cbilgin
The text was updated successfully, but these errors were encountered: