Kawrakow d5507e33ae
Split mode graph for dense Gemma4 assistant (#2022)
* WIP: Split mode graph for Gemma4 assistant

Something is not right - acceptance drops to nearly zero.

* Per model CUDA contexts

Still not working!?

* This works

The issue was that I was not correctly calculating the number
of KV heads for the split KV cache.

* Compiler warnings

* It is better to use llama_context pointers as keys
2026-06-24 18:29:32 +02:00
..
2026-04-23 09:05:39 +02:00
2024-07-27 07:55:01 +02:00
2024-07-27 07:55:01 +02:00
2025-06-19 10:24:53 +03:00
2025-12-15 08:27:20 +01:00
2024-08-12 15:14:32 +02:00
2023-03-29 20:21:09 +03:00
2024-07-27 07:55:01 +02:00