Default Branch

f96eaddba8 · Revert DFlash SWA optimization (#2039) · Updated 2026-06-26 04:00:09 -05:00

Branches

a3fe796f6c · Bitnet: make the scale tensors optional · Updated 2024-10-19 11:37:33 -05:00    jdelony

4666
3467

0e76d21b96 · Adding agray3's graph caching approach · Updated 2024-10-18 10:01:08 -05:00    jdelony

4666
3465

e732da1f57 · Attempt to blindly fix Windows build failure · Updated 2024-10-18 04:35:47 -05:00    jdelony

4666
3465

c4292bf2d9 · iq4_knn: Metal - predictably bad · Updated 2024-10-18 03:48:00 -05:00    jdelony

4666
3468

9612cd79d6 · iq4_kss: very slightly faster Metal dot product · Updated 2024-10-16 07:08:15 -05:00    jdelony

4666
3473

3e0c2519d3 · iq4_ks: faster dot product on Metal · Updated 2024-10-16 06:04:59 -05:00    jdelony

4666
3462

55f91a98f1 · iq3_k: slightly faster Metal dot product · Updated 2024-10-14 02:41:26 -05:00    jdelony

4666
3461

f74905d649 · iq2_k: optimize Metal dot product · Updated 2024-10-13 06:09:53 -05:00    jdelony

4666
3461

f9f15c27b6 · iq2_ks: faster Metal · Updated 2024-10-13 04:23:14 -05:00    jdelony

4666
3470

e441c897a4 · Better model info · Updated 2024-10-10 09:38:59 -05:00    jdelony

4666
3456

e734e888e1 · iq3_ks: AVX2 · Updated 2024-10-10 02:48:42 -05:00    jdelony

4666
3463

f61c37967a · iq3_kl: use iq4_ks instead of iq4_k/iq4_xs · Updated 2024-10-09 04:50:43 -05:00    jdelony

4666
3467

df2bd86a31 · WIP · Updated 2024-10-06 01:09:51 -05:00    jdelony

4666
3458

acaa4869af · Move scale fudge factors to quantization · Updated 2024-10-04 08:14:52 -05:00    jdelony

4666
3453

a553eb191a · Make the entire project c++17 · Updated 2024-10-04 06:23:21 -05:00    jdelony

4666
3453

ed477f1cdc · Do not quantize activations if not necessary also for MoE models · Updated 2024-10-04 03:11:02 -05:00    jdelony

4666
3452

38eb7fa499 · q6_0: this is slightly better · Updated 2024-10-02 10:07:55 -05:00    jdelony

4666
3451

a8e932b734 · Fused y*unary(x) op: Metal · Updated 2024-10-02 08:51:29 -05:00    jdelony

4666
3452

037bbd2d58 · q6_0: can now be used for kv-cache on Metal · Updated 2024-10-02 06:54:25 -05:00    jdelony

4666
3457

1fb3115412 · iq4_nl: faster quantization · Updated 2024-10-01 23:43:09 -05:00    jdelony

4666
3447