From b08b620c9f7d0febc42620d7f6b2d4d211a54cbf Mon Sep 17 00:00:00 2001 From: Kawrakow Date: Wed, 18 Mar 2026 14:25:47 +0100 Subject: [PATCH] Update README --- README.md | 3 +++ 1 file changed, 3 insertions(+) diff --git a/README.md b/README.md index 19be737b..d87b827e 100644 --- a/README.md +++ b/README.md @@ -14,6 +14,9 @@ This repository is a fork of [llama.cpp](https://github.com/ggerganov/llama.cpp) >Do not use quantized models from Unsloth that have `_XL` in their name. These are likely to not work with `ik_llama.cpp`. > >The above has caused some stir, so to clarify: the Unsloth `_XL` models that are likely to not work are those that contain `f16` tensors (which is never a good idea in the first place). All others are fine. + +>[!NOTE] +>Some users have reported issues with graph parallel (a.k.a. split mode `graph`) and partial GPU offload (using `--cpu-moe` or `--n-cpu-moe` or tensor overrides). If you are using/want to use split mode graph and observe gibberish/incoherent responses, try adding `-cuda graphs=0` to your command line. ## Quickstart