ik_llama.cpp

mirror of https://github.com/ikawrakow/ik_llama.cpp.git synced 2026-06-28 04:30:15 -05:00

History

Split mode graph for dense Gemma4 assistant (#2022 )

* WIP: Split mode graph for Gemma4 assistant

Something is not right - acceptance drops to nearly zero.

* Per model CUDA contexts

Still not working!?

* This works

The issue was that I was not correctly calculating the number
of KV heads for the split KV cache.

* Compiler warnings

* It is better to use llama_context pointers as keys

2026-06-24 18:29:32 +02:00

ggml-alloc.h

Merge mainline llama.cpp (#3 )

2024-07-27 07:55:01 +02:00

ggml-backend.h

Fix DFlash oerformance with split mode graph (#1980 )

2026-06-17 18:40:02 +02:00

ggml-cann.h

Merge mainline llama.cpp (#3 )

2024-07-27 07:55:01 +02:00

ggml-cpp.h

Port mdmd from mainline + Qwen2/2.5-VL support (#798 )