Samuel Oliveira Alves
007d640098
Standardize speculative decoding arguments on the server ( #1908 )
...
* refactor spec args
* add shell-safe quoting of string-valued stage keys in speculative decoding
2026-06-04 15:44:57 +02:00
Samuel Oliveira Alves
be8435793e
Pre-allocate buffers for hybrid model checkpoints ( #1774 )
...
* hybrid-spec: improve recurrent checkpoint handling in speculative decoding
* change per-step save to support scheduling and asynchronous tensor operations
* remove redudant backend tensor fallback
* improve recurrent tensor handling for split graph
2026-05-12 07:21:25 +03:00
Samuel Oliveira Alves
260622faf6
Self-decoding: Adds support for suffix decoding ( #1646 )
...
* speculative: implement suffix-tree decoder
* speculative: add support to cache and tuner
2026-04-18 16:10:10 +02:00
Samuel Oliveira Alves
3de81530c5
Allow tuning of the best args for speculative decoding. ( #1595 )
...
* wip: build spec tuner for spefic args
* wip: test different reward system
* spec-tune: fix the reward to find best params given a good TPS
* spec-tune: refactor logic for its own file
* minor clean for comments and modules
2026-04-08 08:02:42 +02:00