Default Branch

f96eaddba8 · Revert DFlash SWA optimization (#2039) · Updated 2026-06-26 04:00:09 -05:00

Branches

0440345ba9 · Revert DFlash SWA optimization · Updated 2026-06-26 03:58:50 -05:00

1
1

a4e408611d · Minor DFlash tweaks · Updated 2026-06-25 10:10:16 -05:00

5
1

e1670f6c6c · Merge remote-tracking branch 'origin/main' into ik/qwen35_mtp_smgraph · Updated 2026-06-24 11:32:10 -05:00

9
7

1f5828eaa4 · It is better to use llama_context pointers as keys · Updated 2026-06-24 08:53:59 -05:00

11
5

9283af5ed8 · Avoid Gemma4 assistant strange tensor name warnings · Updated 2026-06-24 04:20:41 -05:00

11
1

3476dd6a40 · server: variance based checkpoint eviction · Updated 2026-06-23 21:41:08 -05:00

17
1

3cf0f5468f · Also these · Updated 2026-06-19 10:24:24 -05:00

29
2

e734b76632 · Force Gemma4 assistant to be loaded on last GPU · Updated 2026-06-19 08:51:11 -05:00

29
2

d1692e1951 · Allow graph reuse for Gemma4 MTP · Updated 2026-06-19 04:34:45 -05:00

29
1

25d91dea44 · Add compatibility for llama.cpp Gemma4 assistant GGUFs · Updated 2026-06-19 02:50:26 -05:00

30
1

67b0b22760 · Fix Gemma4 MTP compute graph · Updated 2026-06-18 10:51:22 -05:00

34
2

2c1dc8781b · Fix MTP warmup for GLM models · Updated 2026-06-18 08:15:10 -05:00

34
1

dc81d79cb6 · Provide API to gtet the model arch string · Updated 2026-06-17 11:18:32 -05:00

39
4

5b9c3bbc3b · Fix DFlash oerformance with split mode graph · Updated 2026-06-17 00:46:05 -05:00

39
1

6f45163a95 · Fix DFlash on the CPU · Updated 2026-06-16 08:22:36 -05:00

42
0
Included

6be3a488d3 · CUDA FA: faster TG when GQA is 16 and head size is 128 · Updated 2026-06-15 06:46:02 -05:00

63
0
Included

c24d50dd88 · Split mode graph for MiniMax-M3 · Updated 2026-06-15 03:41:34 -05:00

71
0
Included

c73bfbe9ce · Fix #1961 · Updated 2026-06-14 02:42:39 -05:00

77
0
Included

175819b4fb · Style · Updated 2026-06-12 01:19:06 -05:00

85
0
Included

c622ea37d3 · More info · Updated 2026-06-11 09:06:07 -05:00

88
2