Samuel Oliveira Alves 67e6346225
Support for Qwen 3.5 MTP (dense models only) (#1698)
* qwen-mtp: add dense mtp for one draft

* add support for smaller qwen mtp commit

* qwen-mtp: fix graph for qwen dense variants

* Squashed commit of the following:

commit a92a154b38c7fddc84460f8852c900f8d6ce907e
Author: SamuelOliveirads <samueloliveira32df@gmail.com>
Date:   Mon Apr 20 13:30:21 2026 -0300

    recurrent model: refactor api

commit dfac8f19f6edc0014b4116041b89c1e0dfb173c7
Author: SamuelOliveirads <samueloliveira32df@gmail.com>
Date:   Mon Apr 20 12:22:29 2026 -0300

    recurrent model: implement recurrent kernel checkpoint

commit 9c44b117f93e9060030907e1250106358c9ccf47
Author: SamuelOliveirads <samueloliveira32df@gmail.com>
Date:   Sat Apr 18 11:52:39 2026 -0300

    speculative: fix sampler for checkpoints

commit e7006393bca20adcd86e2d77021e0b41d7bd9db1
Author: SamuelOliveirads <samueloliveira32df@gmail.com>
Date:   Fri Apr 17 14:08:25 2026 -0300

    server: refactor checkpoint state logic

commit 57eabf04df5185cab19539185b8f4b85e578905b
Merge: dc4797b7 64234e3c
Author: SamuelOliveirads <samueloliveira32df@gmail.com>
Date:   Fri Apr 17 13:53:41 2026 -0300

    Merge branch 'main' into fix/hybrid-cache-speculative

commit dc4797b72363482bb35750bb5edc87068116dc0f
Author: SamuelOliveirads <samueloliveira32df@gmail.com>
Date:   Fri Apr 17 13:12:40 2026 -0300

    reset ngram mod state for rejected tokens

commit 8ff2d943a31b3d54440698db41e62ea121d661be
Author: SamuelOliveirads <samueloliveira32df@gmail.com>
Date:   Fri Apr 17 13:08:04 2026 -0300

    server: snapshot recurrent state in tensor

commit d93dfb5e6b78a822e7331b33feedbdc47eb5ec79
Author: SamuelOliveirads <samueloliveira32df@gmail.com>
Date:   Thu Apr 16 22:36:37 2026 -0300

    fix: save/restore sampler state during speculative checkpoint

    When speculative decoding rejects draft tokens and restores the
    recurrent state checkpoint, the sampler (RNG, grammar, prev tokens)
    must also be restored to maintain consistency. Without this, the
    sampler state reflects the rejected draft tokens, leading to
    potential divergence.

    Uses common_sampler_clone() to snapshot the sampler before the
    speculative batch decode, and restores it on rejection.

commit d670cf85cd59f23a339faf6fd773889063176801
Author: SamuelOliveirads <samueloliveira32df@gmail.com>
Date:   Thu Apr 16 21:53:52 2026 -0300

    server: spec checkpoints for recurrent models

* server: fix leak context between requests

* qwen3: allow mtp to run with split graph

* qwen3 mtp: selects rows before the ffn
2026-04-28 07:47:50 +02:00
..
2026-04-23 09:05:39 +02:00
2024-07-27 07:55:01 +02:00
2024-07-27 07:55:01 +02:00
2025-06-19 10:24:53 +03:00
2026-04-16 17:26:31 +02:00
2026-04-23 09:05:39 +02:00
2025-12-15 08:27:20 +01:00
2024-08-12 15:14:32 +02:00
2023-03-29 20:21:09 +03:00
2024-07-27 07:55:01 +02:00