Default Branch

f96eaddba8 · Revert DFlash SWA optimization (#2039) · Updated 2026-06-26 04:00:09 -05:00

Branches

2a2ea4c9df · MTP tweaks · Updated 2026-05-05 11:16:18 -05:00    jdelony

204
1

710dc0879a · Cleanup · Updated 2026-05-04 07:37:27 -05:00    jdelony

210
2

b2c9fd1524 · Minor MTP improvement · Updated 2026-05-04 00:52:04 -05:00    jdelony

210
1

19a72d91a2 · Change signature of llama_set_draft_input_hidden_state · Updated 2026-05-03 00:21:29 -05:00    jdelony

211
1

0dade85092 · Support Mimo-2.5 · Updated 2026-05-02 10:33:50 -05:00    jdelony

215
1

e55825bdaa · Disable k-shift for split mode graph · Updated 2026-04-30 09:29:50 -05:00    jdelony

220
1

6a6ca2f525 · MTP: better graph reuse · Updated 2026-04-30 06:27:02 -05:00    jdelony

220
1

47b46ba399 · Faster small batch inference for MoE models · Updated 2026-04-29 02:32:39 -05:00    jdelony

223
1

7faad6e42c · Revert "server: defer recurrent-state reset to graph build (addresses #1696 r…" · Updated 2026-04-28 05:33:11 -05:00    jdelony

223
1

9129abd255 · Enable CUDA graphs for AllReduce ops when possible · Updated 2026-04-28 04:31:30 -05:00    jdelony

223
1

1d09b51f64 · Do not create CUDA graphs when disabled · Updated 2026-04-27 09:23:16 -05:00    jdelony

228
1

666f862bd6 · Revert "Faster prompt processing on CUDA (#1687)" · Updated 2026-04-27 06:27:51 -05:00    jdelony

228
1

eb550add1a · Adding forgotten file · Updated 2026-04-24 10:25:10 -05:00    jdelony

231
3

ccd6f1875f · Quantization options · Updated 2026-04-22 08:14:59 -05:00    jdelony

236
1

a4012a50c6 · Make the iq2_ks slow quantization path a compile time option · Updated 2026-04-22 02:45:21 -05:00    jdelony

238
3

ba70752b68 · Add all quantization types to Mistral4 MLA on the CPU · Updated 2026-04-20 07:03:56 -05:00    jdelony

239
1

97369ccd1c · Fix NaNs in Q4_K/Q5_K quantized MiniMax-2.7 models on CUDA · Updated 2026-04-19 06:06:58 -05:00    jdelony

244
1

eb76fa5d0b · Also here · Updated 2026-04-18 12:37:12 -05:00    jdelony

241
2

ebfed8f3fd · Remove unused function · Updated 2026-04-17 04:14:39 -05:00    jdelony

244
2

01a3b4d134 · Disallow speculation for hybrid/recurrent models · Updated 2026-04-16 08:47:39 -05:00    jdelony

249
1