llama.cpp/src at b9266 - llama.cpp - Jared's Git Server

jdelony/llama.cpp

mirror of https://github.com/ggml-org/llama.cpp.git synced 2026-07-01 00:10:21 -05:00

History

Daniel Elliott eeeaf6180b

llama-graph: fix null-buffer crash in llm_graph_input_attn_kv_iswa for SWA-only models (#23131 )

When a model has zero non-SWA attention layers (e.g. a SWA-only slice of Gemma 4),
the base KV cache has no layer tensors. The input tensors (self_k_idxs, self_v_idxs,
self_kq_mask) are created as graph input nodes but never consumed by any compute node,
so the backend scheduler never allocates a buffer for them. Calling
mctx->get_base()->set_input_k_idxs() on an unallocated tensor then hits
GGML_ASSERT(buffer) at ggml-backend.cpp:194.

The same scenario applies symmetrically: if a model had zero SWA layers, the SWA
tensors would be unallocated.

Fix: guard both the base and SWA set_input calls with null/buffer checks, matching
the pattern already used by llm_graph_input_mem_hybrid_iswa::set_input (line ~674)
which has the comment: 'base tensors may not be allocated if there are no non-SWA
attention layers'.

Also fix can_reuse() in the same class to skip the ne[0] and kq_mask checks for
unallocated tensors, preventing a null-dereference on the reuse path.

2026-05-21 09:20:51 +03:00

..

metal : optimize pad + cpy (#23354 )

2026-05-20 09:42:00 +03:00

CMakeLists.txt

cmake: use glob to collect src/models sources (#22005 )

2026-04-16 23:25:16 +02:00

llama-adapter.cpp

fix: correct misspellings in code comments (#21217 )

2026-03-31 13:50:51 +02:00

llama-adapter.h

llama : re-enable manual LoRA adapter free (#19983 )

2026-03-18 12:03:26 +02:00

llama-arch.cpp

llama + spec: MTP Support (#22673 )

2026-05-16 20:06:23 +08:00

llama-arch.h

llama + spec: MTP Support (#22673 )

2026-05-16 20:06:23 +08:00

llama-batch.cpp

kv-cache : fix M-RoPE checkpoints (#20132 )

2026-03-06 08:46:51 +02:00

llama-batch.h

fix: correct misspellings in code comments (#21217 )

2026-03-31 13:50:51 +02:00

llama-chat.cpp

mtmd, model : merge HunyuanOCR into HunyuanVL and fix OCR vision precision (#23329 )

2026-05-21 00:35:37 +02:00

llama-chat.h

mtmd, model : merge HunyuanOCR into HunyuanVL and fix OCR vision precision (#23329 )

2026-05-21 00:35:37 +02:00

llama-context.cpp

Move to backend sampling for MTP draft path (#23287 )

2026-05-20 22:34:45 +05:30

llama-context.h

llama: avoid copying logits during prompt decode in MTP (#23198 )

2026-05-17 23:30:25 +08:00

llama-cparams.cpp

cparams : rename LLAMA_MAX_PARALLEL_SEQUENCES to LLAMA_MAX_SEQ (#14188 )

2025-06-15 10:08:58 +03:00

llama-cparams.h

llama: avoid copying logits during prompt decode in MTP (#23198 )

2026-05-17 23:30:25 +08:00

llama-ext.h

llama: avoid copying logits during prompt decode in MTP (#23198 )

2026-05-17 23:30:25 +08:00

llama-grammar.cpp

common/grammar: fix grammar parsing issues to prevent stack overflow and hangs (#18604 )

2026-03-21 18:43:35 +01:00

llama-grammar.h

common/grammar : replace problematic backtracking regex [\s\S]* (#18342 )

2026-01-03 16:02:43 -06:00

llama-graph.cpp

llama-graph: fix null-buffer crash in llm_graph_input_attn_kv_iswa for SWA-only models (#23131 )

2026-05-21 09:20:51 +03:00

llama-graph.h

llama : MTP clean-up (#23269 )

2026-05-19 15:32:58 +03:00

llama-hparams.cpp

llama + spec: MTP Support (#22673 )

2026-05-16 20:06:23 +08:00

llama-hparams.h

llama + spec: MTP Support (#22673 )

2026-05-16 20:06:23 +08:00

llama-impl.cpp

llama : correct platform-independent loading of BOOL metadata (#21428 )

2026-04-06 01:40:38 +02:00

llama-impl.h

llama : enable chunked fused GDN path (#20340 )

2026-03-11 22:46:40 +02:00

llama-io.cpp

server : avoid checkpoint data host copies (#22558 )

2026-05-02 18:03:25 +03:00

llama-io.h

llama : add option to save memory in device buffers (#22679 )

2026-05-05 06:35:07 +03:00

llama-kv-cache-iswa.cpp

(revert) kv-cache : do not quantize SWA KV cache (#21332 )

2026-04-03 09:07:01 +03:00

llama-kv-cache-iswa.h

llama: print memory breakdown on exit (#15860 )

2025-09-24 16:53:48 +02:00

llama-kv-cache.cpp

ggml : implement fast walsh-hadamard transform for kv rotation (#21352 ) (#22631 )

2026-05-05 10:05:05 +08:00

llama-kv-cache.h

kv-cache : support attention rotation for heterogeneous iSWA (#21513 )

2026-04-07 20:31:28 +03:00

llama-kv-cells.h

llama: store mrope data in KV cell (#16825 )

2025-10-29 18:09:18 +01:00

llama-memory-hybrid-iswa.cpp

llama : MTP clean-up (#23269 )

2026-05-19 15:32:58 +03:00

llama-memory-hybrid-iswa.h

llama + spec: MTP Support (#22673 )

2026-05-16 20:06:23 +08:00

llama-memory-hybrid.cpp

llama : MTP clean-up (#23269 )

2026-05-19 15:32:58 +03:00

llama-memory-hybrid.h

llama + spec: MTP Support (#22673 )

2026-05-16 20:06:23 +08:00

llama-memory-recurrent.cpp

llama : MTP clean-up (#23269 )

2026-05-19 15:32:58 +03:00

llama-memory-recurrent.h

llama : MTP clean-up (#23269 )

2026-05-19 15:32:58 +03:00

llama-memory.cpp

memory : correctly handle failure in apply() (#14438 )

2025-06-30 18:03:03 +03:00

llama-memory.h

llama + spec: MTP Support (#22673 )

2026-05-16 20:06:23 +08:00

llama-mmap.cpp

Update llama-mmap to use ftello/fseeko (#22497 )

2026-04-30 14:17:52 -07:00

llama-mmap.h

llama: fix llama-model-saver (#20503 )

2026-03-25 12:53:16 +02:00

llama-model-loader.cpp

llama + spec: MTP Support (#22673 )

2026-05-16 20:06:23 +08:00

llama-model-loader.h

llama + spec: MTP Support (#22673 )

2026-05-16 20:06:23 +08:00

llama-model-saver.cpp

model : NvFP4 quantized LM head support (#23046 )

2026-05-16 11:09:27 +02:00

llama-model-saver.h

llama: fix llama-model-saver (#20503 )

2026-03-25 12:53:16 +02:00

llama-model.cpp

llama + spec: MTP Support (#22673 )

2026-05-16 20:06:23 +08:00

llama-model.h

model : NvFP4 quantized LM head support (#23046 )

2026-05-16 11:09:27 +02:00

llama-quant.cpp

model: move load_hparams and load_tensors to per-model definition (#22004 )

2026-05-04 12:36:59 +02:00

llama-quant.h

llama : refactor src/llama.cpp (#10902 )

2025-01-03 10:18:53 +02:00

llama-sampler.cpp

llama : rename llama-sampling to llama-sampler (#19363 )

2026-02-06 07:26:54 +01:00

llama-sampler.h

llama : rename llama-sampling to llama-sampler (#19363 )

2026-02-06 07:26:54 +01:00

llama-vocab.cpp

model : add sarvam_moe architecture support (#20275 )

2026-05-09 16:31:50 +02:00

llama-vocab.h

model : add sarvam_moe architecture support (#20275 )

2026-05-09 16:31:50 +02:00

llama.cpp

llama : add missing call to ggml_backend_load_all() (#22752 )

2026-05-07 08:24:47 +03:00

unicode-data.cpp

server : better security control for public deployments (#9776 )

2024-10-08 13:27:04 +02:00

unicode-data.h

llama : reduce compile time and binary size (#9712 )

2024-10-02 15:49:55 +02:00

unicode.cpp

unicode,test: add Qwen3.5 non-backtracking tokenizer handler and regr… (#22110 )

2026-05-14 11:03:40 +02:00

unicode.h

vocab: fix Gemma4 tokenizer (#21343 )

2026-04-03 10:33:03 +02:00