mirror of
https://github.com/ikawrakow/ik_llama.cpp.git
synced 2026-06-28 04:30:15 -05:00
* Use build_std_attention for Gemma4 when possible It is possible for the 26b MoE and 31b dense models. It is not possible for the E4B/E2B vaiants because they don't have KV cache in each layer. * Standardize Gemma4 dense ffn * WIP: Gemma4 split mode graph Runs but produces NaNs * WIP: Gemma4 split mode graph Runs but very high PPL. At least it is no longer NaN. * WIP * This works! * Put attn_norm, attn_post_norm, ffn_norm, ffn_post_norm on all GPUs * Fix crash when saving/loading KV cache * WIP: split mode graph for Gemma4-MoE - crashes * Split mode graph for Gemma4-MoE - this works * Disable SWA optimization Something goes wrong there * Consolidate MoE and dense graph parallel