running as a service #131

lfoppiano · 2024-12-05T12:54:05Z

lfoppiano
Dec 5, 2024

Hi, first of all, than you for this tool, it's a very useful and interesting approach for running models on low resources.

I was wondering whether you have any plans to add a way to run it as a service, where the whole model is not loaded every time a new prompt is provided. Something like llama-server?

I did try to run a model quantized for BitNET with llama-server but it seems they are not compatible. Do you have any comment or suggestions?

Thank you in advance
Luca

celsowm · 2025-04-16T10:57:56Z

celsowm
Apr 16, 2025

Any news about it?

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

running as a service #131

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 1 comment

{{title}}

Select a reply

running as a service #131

lfoppiano Dec 5, 2024

Replies: 1 comment

celsowm Apr 16, 2025

lfoppiano
Dec 5, 2024

celsowm
Apr 16, 2025