ik_llama.cpp

mirror of https://github.com/ikawrakow/ik_llama.cpp.git synced 2026-06-28 04:30:15 -05:00

History

Split mode graph for dense Gemma4 assistant (#2022 )

* WIP: Split mode graph for Gemma4 assistant

Something is not right - acceptance drops to nearly zero.

* Per model CUDA contexts

Still not working!?

* This works

The issue was that I was not correctly calculating the number
of KV heads for the split KV cache.

* Compiler warnings

* It is better to use llama_context pointers as keys

2026-06-24 18:29:32 +02:00

cmake

Merge mainline llama.cpp (#3 )

2024-07-27 07:55:01 +02:00

include

Split mode graph for dense Gemma4 assistant (#2022 )

2026-06-24 18:29:32 +02:00

src

Split mode graph for dense Gemma4 assistant (#2022 )

2026-06-24 18:29:32 +02:00

.gitignore

Merge mainline llama.cpp (#3 )

2024-07-27 07:55:01 +02:00

CMakeLists.txt

cmake: drop ggml-blas.h from GGML_PUBLIC_HEADERS (#2007 )

2026-06-21 07:49:09 +02:00