mirror of
https://github.com/ikawrakow/ik_llama.cpp.git
synced 2026-06-28 04:30:15 -05:00
* qwen-mtp: add dense mtp for one draft
* add support for smaller qwen mtp commit
* qwen-mtp: fix graph for qwen dense variants
* Squashed commit of the following:
commit a92a154b38c7fddc84460f8852c900f8d6ce907e
Author: SamuelOliveirads <samueloliveira32df@gmail.com>
Date: Mon Apr 20 13:30:21 2026 -0300
recurrent model: refactor api
commit dfac8f19f6edc0014b4116041b89c1e0dfb173c7
Author: SamuelOliveirads <samueloliveira32df@gmail.com>
Date: Mon Apr 20 12:22:29 2026 -0300
recurrent model: implement recurrent kernel checkpoint
commit 9c44b117f93e9060030907e1250106358c9ccf47
Author: SamuelOliveirads <samueloliveira32df@gmail.com>
Date: Sat Apr 18 11:52:39 2026 -0300
speculative: fix sampler for checkpoints
commit e7006393bca20adcd86e2d77021e0b41d7bd9db1
Author: SamuelOliveirads <samueloliveira32df@gmail.com>
Date: Fri Apr 17 14:08:25 2026 -0300
server: refactor checkpoint state logic
commit 57eabf04df5185cab19539185b8f4b85e578905b
Merge: dc4797b7 64234e3c
Author: SamuelOliveirads <samueloliveira32df@gmail.com>
Date: Fri Apr 17 13:53:41 2026 -0300
Merge branch 'main' into fix/hybrid-cache-speculative
commit dc4797b72363482bb35750bb5edc87068116dc0f
Author: SamuelOliveirads <samueloliveira32df@gmail.com>
Date: Fri Apr 17 13:12:40 2026 -0300
reset ngram mod state for rejected tokens
commit 8ff2d943a31b3d54440698db41e62ea121d661be
Author: SamuelOliveirads <samueloliveira32df@gmail.com>
Date: Fri Apr 17 13:08:04 2026 -0300
server: snapshot recurrent state in tensor
commit d93dfb5e6b78a822e7331b33feedbdc47eb5ec79
Author: SamuelOliveirads <samueloliveira32df@gmail.com>
Date: Thu Apr 16 22:36:37 2026 -0300
fix: save/restore sampler state during speculative checkpoint
When speculative decoding rejects draft tokens and restores the
recurrent state checkpoint, the sampler (RNG, grammar, prev tokens)
must also be restored to maintain consistency. Without this, the
sampler state reflects the rejected draft tokens, leading to
potential divergence.
Uses common_sampler_clone() to snapshot the sampler before the
speculative batch decode, and restores it on rejection.
commit d670cf85cd59f23a339faf6fd773889063176801
Author: SamuelOliveirads <samueloliveira32df@gmail.com>
Date: Thu Apr 16 21:53:52 2026 -0300
server: spec checkpoints for recurrent models
* server: fix leak context between requests
* qwen3: allow mtp to run with split graph
* qwen3 mtp: selects rows before the ffn