SamuelOliveirads
3a1d46c4d1
Merge remote-tracking branch 'origin/main' into feat/dflash-implementation
...
# Conflicts:
# common/common.cpp
# common/speculative.cpp
# convert_hf_to_gguf.py
# examples/server/server-context.cpp
# examples/server/server-context.h
# src/llama-arch.cpp
# src/llama-arch.h
# src/llama-model.cpp
# src/llama.cpp
2026-06-13 17:27:52 -03:00
Joel Farthing
dc51c6f9b2
Add Mellum2 architecture support ( #1919 )
...
Co-authored-by: Joel Farthing <262452229+joelfarthing@users.noreply.github.com>
2026-06-04 14:28:02 +02:00
SamuelOliveirads
82cff238fe
Initial dflash implementation
2026-05-28 18:57:58 -03:00
firecoperana
e15a215e6b
model : Port Minimax M2 from mainline ( #907 )
...
Co-authored-by: firecoperana <firecoperana>
2025-11-06 18:09:24 +02:00
firecoperana
079231c291
model : add grok-2 support ( #782 )
...
Co-authored-by: firecoperana <firecoperana>
2025-09-23 16:31:01 +02:00
Thireus ☠
d65d5fe29e
Add support for GLM-4.5 models ( #668 )
...
* GLM-4.5
* GLM-4.5
* GLM-4.5
* convert_hf_to_gguf.py compatibility bugfix with GLM-4.5
From @ubergarm - https://github.com/ikawrakow/ik_llama.cpp/pull/668#issuecomment-3145913701
* Add ubergarm comments + my own
* Revert to llama.cpp script version that produced good BF16
See: https://github.com/ikawrakow/ik_llama.cpp/pull/668#issuecomment-3147374559
* Support for jinja chat templates
See https://github.com/ikawrakow/ik_llama.cpp/pull/668#issuecomment-3148109962
* GLM-4.5 llama.cpp final port
* Handle TENSOR_SKIP
Ported the hanges from:
f129567dc0
dcbbd2cb05
Except op info since ik_llama.cpp doesn't support this operation.
* Bugfix for TENSOR_SKIP
skip loading if a tensor has the TENSOR_SKIP flag - @ubergarm via https://github.com/ikawrakow/ik_llama.cpp/pull/668#issuecomment-3155297198
* Update llama.cpp
Restore original GGLM_ASSERT
* Fix chat template detection
Changes suggested by @ubergarm - https://github.com/ikawrakow/ik_llama.cpp/pull/668#issuecomment-3155927840
* Revert to original GGML_ASSERT
2025-08-07 07:55:00 +03:00
ubergarm
d3ed217798
kimi-k2 convert script and chat template ( #612 )
...
* convert_hf_to_gguf for Kimi-K2-Instruct
Adapt mainline `PR14653` for tokenizer while maintaining proper MLA
tensors. Tested with this workflow using deepseek fp8_cast_bf16.py and
triton-cpu to upcast the fp8 safetensors to bf16 safetensors then used
this convert_hf_to_gguf.
* Add Kimi-K2 chat template
moonshotai/Kimi-K2-Instruct
https://github.com/ikawrakow/ik_llama.cpp/pull/609#issuecomment-3071259454
* kimi-k2 add ass to template to get response
2025-07-15 19:54:04 +02:00
Fizz~
27ff5bf57e
Special handling of Seed Coder FIM tokens ( #585 )
...
* Special handling of Seed Coder FIM tokens
* vocab: Add Seed Coder pretokenizer
* Formatting fix
* Update llama.h
2025-07-06 12:13:55 +02:00
saood06
5c0a01bdaf
Deepseek V3 support added ( #176 )
...
Co-authored-by: Stanisław Szymczyk <sszymczy@gmail.com>
2025-01-23 18:24:10 +02:00
Kawrakow
0ceeb11721
Merge mainline llama.cpp ( #3 )
...
* Merging mainline - WIP
* Merging mainline - WIP
AVX2 and CUDA appear to work.
CUDA performance seems slightly (~1-2%) lower as it is so often
the case with llama.cpp/ggml after some "improvements" have been made.
* Merging mainline - fix Metal
* Remove check
---------
Co-authored-by: Iwan Kawrakow <iwan.kawrakow@gmail.com>
2024-07-27 07:55:01 +02:00