ik_llama.cpp

mirror of https://github.com/ikawrakow/ik_llama.cpp.git synced 2026-06-28 04:30:15 -05:00

Author	SHA1	Message	Date
SamuelOliveirads	3a1d46c4d1	Merge remote-tracking branch 'origin/main' into feat/dflash-implementation # Conflicts: # common/common.cpp # common/speculative.cpp # convert_hf_to_gguf.py # examples/server/server-context.cpp # examples/server/server-context.h # src/llama-arch.cpp # src/llama-arch.h # src/llama-model.cpp # src/llama.cpp	2026-06-13 17:27:52 -03:00
Joel Farthing	dc51c6f9b2	Add Mellum2 architecture support (#1919 ) Co-authored-by: Joel Farthing <262452229+joelfarthing@users.noreply.github.com>	2026-06-04 14:28:02 +02:00
SamuelOliveirads	82cff238fe	Initial dflash implementation	2026-05-28 18:57:58 -03:00
firecoperana	e15a215e6b	model : Port Minimax M2 from mainline (#907 ) Co-authored-by: firecoperana <firecoperana>	2025-11-06 18:09:24 +02:00
firecoperana	079231c291	model : add grok-2 support (#782 ) Co-authored-by: firecoperana <firecoperana>	2025-09-23 16:31:01 +02:00
Thireus ☠	d65d5fe29e	Add support for GLM-4.5 models (#668 ) * GLM-4.5 * GLM-4.5 * GLM-4.5 * convert_hf_to_gguf.py compatibility bugfix with GLM-4.5 From @ubergarm - https://github.com/ikawrakow/ik_llama.cpp/pull/668#issuecomment-3145913701 * Add ubergarm comments + my own * Revert to llama.cpp script version that produced good BF16 See: https://github.com/ikawrakow/ik_llama.cpp/pull/668#issuecomment-3147374559 * Support for jinja chat templates See https://github.com/ikawrakow/ik_llama.cpp/pull/668#issuecomment-3148109962 * GLM-4.5 llama.cpp final port * Handle TENSOR_SKIP Ported the hanges from: `f129567dc0` `dcbbd2cb05` Except op info since ik_llama.cpp doesn't support this operation. * Bugfix for TENSOR_SKIP skip loading if a tensor has the TENSOR_SKIP flag - @ubergarm via https://github.com/ikawrakow/ik_llama.cpp/pull/668#issuecomment-3155297198 * Update llama.cpp Restore original GGLM_ASSERT * Fix chat template detection Changes suggested by @ubergarm - https://github.com/ikawrakow/ik_llama.cpp/pull/668#issuecomment-3155927840 * Revert to original GGML_ASSERT	2025-08-07 07:55:00 +03:00
ubergarm	d3ed217798	kimi-k2 convert script and chat template (#612 ) * convert_hf_to_gguf for Kimi-K2-Instruct Adapt mainline `PR14653` for tokenizer while maintaining proper MLA tensors. Tested with this workflow using deepseek fp8_cast_bf16.py and triton-cpu to upcast the fp8 safetensors to bf16 safetensors then used this convert_hf_to_gguf. * Add Kimi-K2 chat template moonshotai/Kimi-K2-Instruct https://github.com/ikawrakow/ik_llama.cpp/pull/609#issuecomment-3071259454 * kimi-k2 add ass to template to get response	2025-07-15 19:54:04 +02:00
Fizz~	27ff5bf57e	Special handling of Seed Coder FIM tokens (#585 ) * Special handling of Seed Coder FIM tokens * vocab: Add Seed Coder pretokenizer * Formatting fix * Update llama.h	2025-07-06 12:13:55 +02:00
saood06	5c0a01bdaf	Deepseek V3 support added (#176 ) Co-authored-by: Stanisław Szymczyk <sszymczy@gmail.com>	2025-01-23 18:24:10 +02:00
Kawrakow	0ceeb11721	Merge mainline llama.cpp (#3 ) * Merging mainline - WIP * Merging mainline - WIP AVX2 and CUDA appear to work. CUDA performance seems slightly (~1-2%) lower as it is so often the case with llama.cpp/ggml after some "improvements" have been made. * Merging mainline - fix Metal * Remove check --------- Co-authored-by: Iwan Kawrakow <iwan.kawrakow@gmail.com>	2024-07-27 07:55:01 +02:00

10 Commits