llama.cpp

mirror of https://github.com/ggml-org/llama.cpp.git synced 2026-06-27 23:50:20 -05:00

History

Georgi Gerganov 5dcb711666

speculative : fix n_outputs_max and remove draft-simple auto-enable (#23988 )

* speculative : add common_speculative_n_max helper function

Extract the speculative max-draft-size logic from server_n_outputs_max
into a reusable common_speculative_n_max() function in common/speculative.

Assisted-by: llama.cpp:local pi

* cont : draft context always has n_parallel outputs

* llama : log n_outputs_max

* speculative : remove draft-simple auto-enable

* ci : enable server tests on PRs

2026-06-01 22:26:58 +03:00

actions

ci : clear cache instead of "no timestamp" keys + fix macos (#23895 )

2026-05-30 08:52:30 +03:00

ISSUE_TEMPLATE

github: mention --log-file in issue templates (#23277 )

2026-05-19 21:35:10 +02:00

workflows

speculative : fix n_outputs_max and remove draft-simple auto-enable (#23988 )

2026-06-01 22:26:58 +03:00

labeler.yml

ui: Restructure repo to use tools/ui folder and ui / UI / llama-ui / LLAMA_UI naming (#23064 )

2026-05-16 02:02:40 +02:00

pull_request_template.md

gitignore : add .pi + personal SYSTEM.md (#22316 )

2026-04-25 09:20:45 +03:00