Default Branch

f96eaddba8 · Revert DFlash SWA optimization (#2039) · Updated 2026-06-26 04:00:09 -05:00

Branches

e698e85023 · More formatting · Updated 2026-04-02 01:03:08 -05:00    jdelony

286
4

e09a365f0b · Fix unknown tensor type warnings · Updated 2026-04-01 02:38:43 -05:00    jdelony

287
1

1ce09a49ae · Fix BF16 mmproj on the CPU · Updated 2026-04-01 02:03:03 -05:00    jdelony

288
1

97b1a69998 · CPU FA: check if types are supported · Updated 2026-03-31 09:10:41 -05:00    jdelony

290
1

6bae7bc0d0 · Fix re-quantizing a model using row-interleaved quants · Updated 2026-03-31 08:31:09 -05:00    jdelony

291
1

0b935e2bf7 · This is better · Updated 2026-03-31 07:11:12 -05:00    jdelony

291
2

255851ebdf · Even better Q4_0 KV cache (CPU) · Updated 2026-03-30 04:55:15 -05:00    jdelony

295
1

b54d23168a · Enable all CPU-backend FA supported quants by default · Updated 2026-03-29 07:29:43 -05:00    jdelony

296
1

4147c7b2e9 · Even better Q4_0 KV cache · Updated 2026-03-29 05:26:04 -05:00    jdelony

296
1

5dd26f5bfd · Do not override mmap if GGML_CUDA_NO_PINNED is set · Updated 2026-03-29 00:53:20 -05:00    jdelony

299
1

6b0695aff0 · Add --fit to llama-bench · Updated 2026-03-28 10:49:44 -05:00    jdelony

299
1

d157439295 · Honor manual splits · Updated 2026-03-28 05:26:18 -05:00    jdelony

302
1

dee44a053c · Correct available split modes in llama-bench · Updated 2026-03-28 03:26:56 -05:00    jdelony

302
1

d11e1b2caf · V-cache Hadamard transform · Updated 2026-03-27 05:00:16 -05:00    jdelony

302
1

8d74502d1e · Fix CUDA Hadamard transfrom bug · Updated 2026-03-27 04:33:44 -05:00    jdelony

303
1

c3a33d8b71 · Fix bug in CPU floash attention for bf16 KV cache · Updated 2026-03-26 11:15:50 -05:00    jdelony

305
1

f981514a6a · Typo · Updated 2026-03-26 08:34:17 -05:00    jdelony

306
2

3c26e05d0a · Fix jinja · Updated 2026-03-26 07:10:20 -05:00    jdelony

306
1

6b47967243 · Print info when allocating large amounts of pinned host memory · Updated 2026-03-26 04:35:22 -05:00    jdelony

308
1

0c280a1bd2 · Ignore MTP layer(s) when computing required memory · Updated 2026-03-26 04:00:10 -05:00    jdelony

308
1