Replies: 1 comment
-
Any news about it? |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
Hi, first of all, than you for this tool, it's a very useful and interesting approach for running models on low resources.
I was wondering whether you have any plans to add a way to run it as a service, where the whole model is not loaded every time a new prompt is provided. Something like llama-server?
I did try to run a model quantized for BitNET with
llama-server
but it seems they are not compatible. Do you have any comment or suggestions?Thank you in advance
Luca
Beta Was this translation helpful? Give feedback.
All reactions