ik_llama.cpp

mirror of https://github.com/ikawrakow/ik_llama.cpp.git synced 2026-06-28 04:30:15 -05:00

History

Kawrakow 847e191936

* Use build_std_attention for Gemma4 when possible

It is possible for the 26b MoE and 31b dense models.
It is not possible for the E4B/E2B vaiants because they
don't have KV cache in each layer.

* Standardize Gemma4 dense ffn

* WIP: Gemma4 split mode graph

Runs but produces NaNs

* WIP: Gemma4 split mode graph

Runs but very high PPL. At least it is no longer NaN.

* WIP

* This works!

* Put attn_norm, attn_post_norm, ffn_norm, ffn_post_norm on all GPUs

* Fix crash when saving/loading KV cache

* WIP: split mode graph for Gemma4-MoE - crashes

* Split mode graph for Gemma4-MoE - this works

* Disable SWA optimization

Something goes wrong there

* Consolidate MoE and dense graph parallel

2026-04-09 14:07:29 +02:00

cmake

Merge mainline llama.cpp (#3 )

2024-07-27 07:55:01 +02:00

include

Bonsai support (AVX2, generic) (#1570 )

2026-04-02 16:54:08 +02:00

src

Graph parallel for Gemma4 MoE (#1600 )

2026-04-09 14:07:29 +02:00

.gitignore

Merge mainline llama.cpp (#3 )

2024-07-27 07:55:01 +02:00

CMakeLists.txt

Enable all CPU-backend FA supported quants by default (#1549 )

2026-03-29 14:36:09 +02:00