Replies: 1 comment
-
The macros are defined in
|
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
I'm trying to replace the mixed-precision quantization GEMM CUDA kernel in llama.cpp with my implementation. For this, I must understand the data arrangement and calculation logic in kernel mul_mat_vec_q.
I tried to understand the code by reading it but failed, for a large number of unknown variables and complex parallel calculations.
I want to know how I can understand the code. What do qk, qi, vdr mean?and how the kernel works?
I really feel terrible. Who can help me.
Beta Was this translation helpful? Give feedback.
All reactions