fix crash on non-AVX systems dynamically loading GGML CPU backends #11780

jmorganca · 2025-02-10T02:16:51Z

Thanks for the awesome work by @slaren in #10469 (and a few follow up PRs) to enable dynamic GGML backend loading. This made supporting different CPU instructions in GGML much, much easier.

I noticed a small hitch with with the llamafile code where a machine with a non-AVX CPU would crash when trying to dlopen CPU libraries built with GGML_LLAMAFILE=ON. This moves the AVX-dependent code to do a member variable, fixing the crash on dlopen. I'm not sure how sgemm.cpp is vendored, and so let me know the best way/place to suggest a change.

slaren

Thanks, I missed this global. The fix looks ok, but if the code is not inlined it may add some overhead to the other types. I will leave this open for a while in case someone knowledgeable about llamafile/tinyblas wants to propose a better solution.

jmorganca · 2025-02-14T18:16:46Z

Thanks for merging @slaren. I'm running some performance tests after noticing ollama/ollama#9087. I'm not sure if this PR is the root cause, but I haven't ruled it out yet. In any case will keep you posted and wanted to give you a heads up in case

slaren · 2025-02-14T18:32:57Z

Llamafile tinyblas should only be used for prompt processing, so if you are also observing a decrease of performance during generation, it is not very likely that it was caused by this change.

…rg#11780)

llamafile: use member variable instead of constant for iq4nlt

f3ee51e

github-actions bot added the ggml changes relating to the ggml tensor library for machine learning label Feb 10, 2025

slaren approved these changes Feb 10, 2025

View reviewed changes

slaren merged commit 8a8c4ce into ggml-org:master Feb 13, 2025
46 checks passed

orca-zhang pushed a commit to orca-zhang/llama.cpp that referenced this pull request Feb 26, 2025

llamafile: use member variable instead of constant for iq4nlt (ggml-o…

ef0dbde

…rg#11780)

arthw pushed a commit to arthw/llama.cpp that referenced this pull request Feb 26, 2025

llamafile: use member variable instead of constant for iq4nlt (ggml-o…

a32b415

…rg#11780)

mglambda pushed a commit to mglambda/llama.cpp that referenced this pull request Mar 8, 2025

llamafile: use member variable instead of constant for iq4nlt (ggml-o…

87555c4

…rg#11780)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix crash on non-AVX systems dynamically loading GGML CPU backends #11780

fix crash on non-AVX systems dynamically loading GGML CPU backends #11780

jmorganca commented Feb 10, 2025

slaren left a comment

jmorganca commented Feb 14, 2025

slaren commented Feb 14, 2025

fix crash on non-AVX systems dynamically loading GGML CPU backends #11780

fix crash on non-AVX systems dynamically loading GGML CPU backends #11780

Conversation

jmorganca commented Feb 10, 2025

slaren left a comment

Choose a reason for hiding this comment

jmorganca commented Feb 14, 2025

slaren commented Feb 14, 2025