2 Commits

Author SHA1 Message Date
Samuel Oliveira Alves
be8435793e
Pre-allocate buffers for hybrid model checkpoints (#1774)
* hybrid-spec: improve recurrent checkpoint handling in speculative decoding

* change per-step save to support scheduling and asynchronous tensor operations

* remove redudant backend tensor fallback

* improve recurrent tensor handling for split graph
2026-05-12 07:21:25 +03:00
Samuel Oliveira Alves
3de81530c5
Allow tuning of the best args for speculative decoding. (#1595)
* wip: build spec tuner for spefic args

* wip: test different reward system

* spec-tune: fix the reward to find best params given a good TPS

* spec-tune: refactor logic for its own file

* minor clean for comments and modules
2026-04-08 08:02:42 +02:00