Default Branch

f96eaddba8 · Revert DFlash SWA optimization (#2039) · Updated 2026-06-26 04:00:09 -05:00

Branches

b29f64ea70 · iq4_k: scalar dot product · Updated 2024-07-28 05:09:28 -05:00    jdelony

4666
3355

473e280500 · Fusing a mat mul op followed by scale op on the CPU · Updated 2024-07-27 02:45:56 -05:00    jdelony

4666
3349

573e5007cd · Remove check · Updated 2024-07-26 10:00:26 -05:00    jdelony

4666
3350

ccdb948329 · Offload Bitnet token embeddings to the GPU - the right way · Updated 2024-07-26 05:50:41 -05:00    jdelony

4666
3346

db6b0f6dab · Update README with the new CUDA/Meat performance · Updated 2024-07-26 02:06:22 -05:00    jdelony

4666
3346

86d94862ae · iqk_soft_max · Updated 2024-07-22 09:34:42 -05:00    jdelony

4666
3329

7024ecfeb4 · iq1bn: faster AVX2 · Updated 2024-07-17 02:17:05 -05:00    jdelony

4666
3320