Samuel Oliveira Alves 557b674f63
Add llama_context to MTP (#1601)
* wip: separate llama_context for MTP with graph reuse

* wip: fix KV cache desync with separate MTP context

* refactor: remove dead mtp logic code, encapsulate KV mirroring

* mtp-context: derive args directly from the main model's context

* mtp: fix kv cache positions

* clean small comments

* minor refactor for context shift
2026-04-09 15:33:56 +02:00
..
2024-07-27 07:55:01 +02:00
2026-03-26 17:24:11 +01:00
2026-04-09 09:33:17 +02:00
2026-04-09 09:33:17 +02:00
2025-12-15 08:27:20 +01:00
2026-03-25 10:20:22 +01:00
2026-04-09 15:33:56 +02:00
2023-11-13 14:16:23 +02:00