llama_context
* llama: save more VRAM by reserving n_outputs == n_seqs when possible * add n_outputs_per_seq * move n_outputs_max to server-context * change ubatch to batch everywhere
-fa auto
tools/ui
ui
UI
llama-ui
LLAMA_UI