llama.cpp

mirror of https://github.com/ggml-org/llama.cpp.git synced 2026-06-27 23:50:20 -05:00

History

speculative : add heuristic algorithm (#3006 )

* Add heuristic algo for speculative

* Constrain minimum n_draft to 2

* speculative : improve heuristic impl

* speculative : be more rewarding upon guessing max drafted tokens

* speculative : fix typos

---------

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>

2023-09-14 19:14:44 +03:00

CMakeLists.txt

speculative : PoC for speeding-up inference via speculative sampling (#2926 )

2023-09-03 15:12:08 +03:00

speculative.cpp

speculative : add heuristic algorithm (#3006 )

2023-09-14 19:14:44 +03:00