Kawrakow d5507e33ae
Split mode graph for dense Gemma4 assistant (#2022)
* WIP: Split mode graph for Gemma4 assistant

Something is not right - acceptance drops to nearly zero.

* Per model CUDA contexts

Still not working!?

* This works

The issue was that I was not correctly calculating the number
of KV heads for the split KV cache.

* Compiler warnings

* It is better to use llama_context pointers as keys
2026-06-24 18:29:32 +02:00
..
2024-07-27 07:55:01 +02:00
2024-07-27 07:55:01 +02:00