Tarek Dakhran
88636e178f
model : Add LFM2.5-ColBERT-350M and LFM2.5-Embedding-350M ( #24913 )
...
* model : Add LFM2.5-ColBERT-350M and LFM2.5-Embedding-350M
* Restore LFM2 models in README.md
2026-06-24 09:49:46 +03:00
Georgi Gerganov
7acb4e8cd2
hparams : refactor hparams.n_layer ( #24060 )
...
* hparams : refactor hparams.n_layer
* cont : remove `n_layer_kv()`, use n_layer_all instead
* cont : type consistency
* pi : update SYSTEM.md
* models : fix Step3.5 MTP
* cont : remove duplicate switch cases
* cont : explicitly set `false` to extra layers for `is_swa` and `is_recr`
* cont : fix nextn layer count handling
Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>
---------
Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>
2026-06-05 11:09:36 +03:00
Georgi Gerganov
06938ac129
tests : add support for qwen3 SSM archs ( #24031 )
...
* tests : add support for qwen3 SSM archs
* arch : add LLM_KV_ATTENTION_RECURRENT_LAYERS
* cont : naming + TODOs
2026-06-03 10:15:27 +03:00
ynankani
42928bc14d
model : NvFP4 quantized LM head support ( #23046 )
...
* NvFP4 quantized LM head support
Signed-off-by: ynankani <ynankani@nvidia.com>
* Address review commnets
Signed-off-by: ynankani <ynankani@nvidia.com>
* Add assert for NvFp4 lm head and tied embeddings
Signed-off-by: ynankani <ynankani@nvidia.com>
* Address review commnets
Signed-off-by: ynankani <ynankani@nvidia.com>
* Create output_s tensor only when LM head NvFp4
Signed-off-by: ynankani <ynankani@nvidia.com>
---------
Signed-off-by: ynankani <ynankani@nvidia.com>
2026-05-16 11:09:27 +02:00
Xuan-Son Nguyen
994118a183
model: move load_hparams and load_tensors to per-model definition ( #22004 )
...
* git-friendly migration
* add build_graph
* nits
* exclude old code from build
* wip
* add llm_arch_model_i
* prepare downstream functions
* nits
* nits
* wip
* wip
* add back create_tensor_qkv
* fix files missing include
* enforce one llm_build per arch
* cmake: use glob
* missing model params
* nits
* wip
* wip (2)
* wip (3)
* test-llama-archs is happy
* improve switch case
* move more stuff into llm_arch_model_i
* fix downstream code
* nits
* nits (2)
* fix order
* llama_model_base
* LLAMA_LOAD_LOCALS
* small fix
* fix build errors
* auto
* rm migration script and ifdef
2026-05-04 12:36:59 +02:00
PikaPikachu
9db77a020c
model : refactor QKV into common build_qkv and create_tensor_qkv helpers ( #21245 )
...
* model : refactor QKV into common build_qkv and create_tensor_qkv helpers
* model : extend build_qkv to bert/mpt/dbrx/olmo/lfm2/nemotron-h/granite-hybrid/gemma3n-iswa/t5-dec and fix wqkv_s
2026-04-16 17:41:34 +02:00
Sigbjørn Skjæret
f772f6e434
model : support NVFP4 tensors for Gemma4 ( #21971 )
...
* support nvfp4 tensors for Gemma4
* add wo_s to build_attn
* add wo_s to build_attn
* fix glm4
2026-04-16 16:51:47 +02:00
Michael Grau
6729d4920c
model : add control vector support where missing ( #20653 )
...
* Add control vector functions to qwen3.5 and qwen-next models
* Add missing cvec compatibility to the rest of the models
* Adjust comments and formatting
* cleanup
* whitespace
---------
Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>
2026-03-18 23:25:12 +01:00
Xuan-Son Nguyen
59db9a357d
llama: dynamic head_dim and n_rot for SWA ( #20301 )
...
* llama: dynamic head_dim and n_rot for SWA
* also add gguf_writer wrappers
* fix build
* build_rope_shift arg reorder
2026-03-09 22:22:39 +01:00
Sigbjørn Skjæret
35bee031e1
graph : remove redundant scale_w parameter ( #20235 )
2026-03-08 18:58:28 +01:00
Tarek Dakhran
8004f3a8d1
model : add tokenizer from LFM2.5-Audio-1.5B ( #19687 )
...
* model : Add tokenizer from LFM2.5-Audio-1.5B
[LFM2.5-Audio-1.5B](https://huggingface.co/LiquidAI/LFM2.5-Audio-1.5B ) introduced lightweight audio tokenizer.
Tokenizer based on LFM2 architecture and acts as "embedding" model with
different input `n_embd` and output `n_embd_out`.
To be used in https://github.com/ggml-org/llama.cpp/pull/18641 .
To convert use
```shell
python3 convert_hf_to_gguf.py /path/to/LFM2.5-Audio-1.5B/audio_detokenizer
```
* Update convert_hf_to_gguf.py
Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>
* Formatting
* Rework check for attention layers
* Add LFM2 SWA model support
* Address PR feedback
* Set vocab to none
* Move helper function definitions to cpp file
---------
Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>
2026-02-19 09:54:48 +01:00
Georgi Gerganov
6783b11fb0
models : fix LFM2 tensors ( #17548 )
2025-11-27 16:04:29 +02:00
Piotr Wilkin (ilintar)
bea04522ff
refactor : llama-model.cpp ( #16252 )
...
* Sqashed: llama-model.cpp refactoring
* Fix formatting of attn / ffn / ffn_moe calls
* Fix import regression / unify spacing in models.h
* totally DID NOT miss those!
* Add missing qwen3vl(moe) models
* Add missing new .cpp files to build
* Remove extra semicolons
* Editor checker
* Update src/models/models.h
Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>
---------
Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>
2025-10-31 23:40:23 +01:00