mirror of
https://github.com/ikawrakow/ik_llama.cpp.git
synced 2026-06-28 04:30:15 -05:00
Update README
This commit is contained in:
parent
9015b6c51d
commit
b08b620c9f
@ -14,6 +14,9 @@ This repository is a fork of [llama.cpp](https://github.com/ggerganov/llama.cpp)
|
||||
>Do not use quantized models from Unsloth that have `_XL` in their name. These are likely to not work with `ik_llama.cpp`.
|
||||
>
|
||||
>The above has caused some stir, so to clarify: the Unsloth `_XL` models that are likely to not work are those that contain `f16` tensors (which is never a good idea in the first place). All others are fine.
|
||||
|
||||
>[!NOTE]
|
||||
>Some users have reported issues with graph parallel (a.k.a. split mode `graph`) and partial GPU offload (using `--cpu-moe` or `--n-cpu-moe` or tensor overrides). If you are using/want to use split mode graph and observe gibberish/incoherent responses, try adding `-cuda graphs=0` to your command line.
|
||||
|
||||
## Quickstart
|
||||
|
||||
|
||||
Loading…
x
Reference in New Issue
Block a user