Default Branch

f96eaddba8 · Revert DFlash SWA optimization (#2039) · Updated 2026-06-26 04:00:09 -05:00

Branches

0b3d85fe3a · Minor · Updated 2026-06-10 09:48:15 -05:00

90
2

decfaf4dd3 · A few more named nodes · Updated 2026-06-10 04:31:31 -05:00

92
2

2741330db5 · Adjust CUDA FA kernel parameters for head size 512 on Turing · Updated 2026-06-09 10:04:10 -05:00

96
1

14e960489c · More · Updated 2026-06-09 08:05:24 -05:00

96
2

44e8cf1ab3 · Split mode graph for Laguna · Updated 2026-06-09 00:33:55 -05:00

99
1

17f05fc6ec · Fix bf16 graph reduce type · Updated 2026-06-08 08:35:20 -05:00

101
1

1a685f1af1 · Support for alternative Gemma4 assistant · Updated 2026-06-08 07:17:26 -05:00

101
1

98cf5cef72 · CPU FA: disable mask optimization · Updated 2026-06-08 01:18:41 -05:00

101
1

9b4b9ca4ae · CUDA FA: cover Gemma4-4B/2B assistant · Updated 2026-06-08 00:37:15 -05:00

104
1

c3b975eb04 · CPU FA: Check for empty attention mask · Updated 2026-06-05 04:21:15 -05:00

106
1

68a94ab930 · Enable split mode graph for Gemma4-12B · Updated 2026-06-04 11:12:53 -05:00

108
1

0ad43359a4 · Split mode graph for Mellum · Updated 2026-06-04 08:13:33 -05:00

113
1

adeff7dbd3 · Add extra nodes when dealing with MLA and amb · Updated 2026-05-29 03:38:59 -05:00

117
1

ccc48d33c7 · quantize: add exception for Gemma4 · Updated 2026-05-29 01:10:15 -05:00

118
1

20c2f6d97f · Add lower bound to the -amb command line argument · Updated 2026-05-29 00:22:20 -05:00

119
1

6648aa2e6e · Fix Gemma4 vision · Updated 2026-05-28 10:08:46 -05:00

118
0
Included

a18eeb01cb · Qwen3.5 MTP: extract selected tokens earlier · Updated 2026-05-28 06:50:51 -05:00

119
1

7cf668f797 · Make MTP work with split mode graph · Updated 2026-05-27 23:12:34 -05:00

120
47

5a10d701f9 · Arghh · Updated 2026-05-27 08:49:58 -05:00

120
4

68d818269e · Fix GLM MTP with split mode graph · Updated 2026-05-26 11:30:12 -05:00

124
2