llama.cpp

mirror of https://github.com/ggml-org/llama.cpp.git synced 2026-06-27 23:50:20 -05:00

History

llama: limit max outputs of llama_context (#23861 )

* llama: save more VRAM by reserving n_outputs == n_seqs when possible

* add n_outputs_per_seq

* move n_outputs_max to server-context

* change ubatch to batch everywhere

2026-06-01 18:01:38 +03:00

llama-cpp.h

llama : re-enable manual LoRA adapter free (#19983 )

2026-03-18 12:03:26 +02:00

llama.h

llama: limit max outputs of llama_context (#23861 )

2026-06-01 18:01:38 +03:00