Ruixiang Wang
|
88a39274ec
|
spec: add EAGLE3 speculative decoding support (#18039)
* llama : enable layer input extraction
* spec: support eagle3
* eagle3: fix params bug
* eagle3: support Gemma4 eagle3 from RedHatAI
* eagle3: set sync when get features from target
Co-authored-by: tnhnyzc <115956684+tnhnyzc@users.noreply.github.com>
* eagle3 : fix ubatch handling in embd_layer_inp extraction and encoder
Co-authored-by: Doğaç Eldenk <dogacel@gmail.com>
* eagle3: adapt to upstream changes
* eagle3: fix rebase issues and adapt to upstream changes
* eagle3:exclude the eagle3 arch from test-llama-archs
* eagle3: fix editorconfig check failures
* eagle3: fix multi-seq issue in d2t vocab mapping
* cont : minor style / clean-up
* spec : remove `common_speculative_setup_draft_model()`
* llama : clean-up unused API
* eagle3: set d2t vocab mapping in decode graph
* cont : assert layer inputs are configured
* hparams : use n_embd_inp instead of n_embd_target_features
* eagle3: make output.weight optional and inherit from target model when needed
* haparams : generic norm-before-residual param
* llama-ext : consistent names
* cont : fix
* hparams : remove target_hidden_size
* cparams : rename output_layer_inp -> embeddings_layer_inp
* arch : reuse ATTN_NORM_2 instead of adding new hidden norm
* llama : clean-up names
* cont : add assert + comment
* Update conversion/llama.py
Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>
---------
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
Co-authored-by: tnhnyzc <115956684+tnhnyzc@users.noreply.github.com>
Co-authored-by: Doğaç Eldenk <dogacel@gmail.com>
Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>
|
2026-06-12 10:21:06 +03:00 |
|
Aman Gupta
|
04eb4c446d
|
llama : add Gemma4 MTP (#23398)
|
2026-06-07 20:50:54 +08:00 |
|
Georgi Gerganov
|
7acb4e8cd2
|
hparams : refactor hparams.n_layer (#24060)
* hparams : refactor hparams.n_layer
* cont : remove `n_layer_kv()`, use n_layer_all instead
* cont : type consistency
* pi : update SYSTEM.md
* models : fix Step3.5 MTP
* cont : remove duplicate switch cases
* cont : explicitly set `false` to extra layers for `is_swa` and `is_recr`
* cont : fix nextn layer count handling
Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>
---------
Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>
|
2026-06-05 11:09:36 +03:00 |
|
Xuan-Son Nguyen
|
a731805ced
|
mtmd, model: allow skip build_vit() (#24077)
* add model
* nits
|
2026-06-03 17:10:35 +02:00 |
|
Georgi Gerganov
|
06938ac129
|
tests : add support for qwen3 SSM archs (#24031)
* tests : add support for qwen3 SSM archs
* arch : add LLM_KV_ATTENTION_RECURRENT_LAYERS
* cont : naming + TODOs
|
2026-06-03 10:15:27 +03:00 |
|
ynankani
|
42928bc14d
|
model : NvFP4 quantized LM head support (#23046)
* NvFP4 quantized LM head support
Signed-off-by: ynankani <ynankani@nvidia.com>
* Address review commnets
Signed-off-by: ynankani <ynankani@nvidia.com>
* Add assert for NvFp4 lm head and tied embeddings
Signed-off-by: ynankani <ynankani@nvidia.com>
* Address review commnets
Signed-off-by: ynankani <ynankani@nvidia.com>
* Create output_s tensor only when LM head NvFp4
Signed-off-by: ynankani <ynankani@nvidia.com>
---------
Signed-off-by: ynankani <ynankani@nvidia.com>
|
2026-05-16 11:09:27 +02:00 |
|
ynankani
|
9f5f0e689c
|
model : support Gemma4_26B_A4B_NVFP4 (#22804)
* Gemma4_26B_A4B_NvFp4 hf checkpoint convert to gguf format fixes
Signed-off-by: ynankani <ynankani@nvidia.com>
* Apply suggestions from code review
Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>
* Address review comments
Signed-off-by: ynankani <ynankani@nvidia.com>
* fix CRLF
Signed-off-by: ynankani <ynankani@nvidia.com>
* Lint error fix
Signed-off-by: ynankani <ynankani@nvidia.com>
---------
Signed-off-by: ynankani <ynankani@nvidia.com>
Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>
|
2026-05-08 20:42:09 +02:00 |
|
Xuan-Son Nguyen
|
994118a183
|
model: move load_hparams and load_tensors to per-model definition (#22004)
* git-friendly migration
* add build_graph
* nits
* exclude old code from build
* wip
* add llm_arch_model_i
* prepare downstream functions
* nits
* nits
* wip
* wip
* add back create_tensor_qkv
* fix files missing include
* enforce one llm_build per arch
* cmake: use glob
* missing model params
* nits
* wip
* wip (2)
* wip (3)
* test-llama-archs is happy
* improve switch case
* move more stuff into llm_arch_model_i
* fix downstream code
* nits
* nits (2)
* fix order
* llama_model_base
* LLAMA_LOAD_LOCALS
* small fix
* fix build errors
* auto
* rm migration script and ifdef
|
2026-05-04 12:36:59 +02:00 |
|