Default Branch

f96eaddba8 · Revert DFlash SWA optimization (#2039) · Updated 2026-06-26 04:00:09 -05:00

Branches

5b6999970e · Fix Q5_0 flash attention · Updated 2024-10-01 07:49:03 -05:00    jdelony

4666
3446

09789d017f · Be able to use IQ4_NL for KV cache on ARM_NEON · Updated 2024-10-01 06:43:33 -05:00    jdelony

4666
3445

f265260f23 · Merge remote-tracking branch 'origin/main' into ik/cuda_faster_iq4nl_kvcache · Updated 2024-10-01 04:26:53 -05:00    jdelony

4666
3445

a6b097c1b1 · Fix AVX2 · Updated 2024-10-01 02:54:58 -05:00    jdelony

4666
3443

cd1002670c · POC SVD: try involving the quantized weights. · Updated 2024-10-01 00:58:42 -05:00    jdelony

4666
3448

5f3f3bb09e · iqk_mul_mat: better srategy when nrc_y not divisible by ny · Updated 2024-10-01 00:12:29 -05:00    jdelony

4666
3441

d12d0e9b04 · Allow bf16 kv-cache · Updated 2024-09-29 00:42:33 -05:00    jdelony

4666
3440

c294485f45 · Time to fix replace_all · Updated 2024-09-28 09:43:54 -05:00    jdelony

4666
3439

147f9606d0 · CUDA non-contiguous RoPE · Updated 2024-09-28 06:37:28 -05:00    jdelony

4666
3438

05cb629007 · GGML_UNARY_OP_SWIGLU: cleanup · Updated 2024-09-28 05:36:27 -05:00    jdelony

4666
3441

a8f37b61ee · Better sub-3-bit quantization mixes with a qkv tensor · Updated 2024-09-28 00:09:42 -05:00    jdelony

4666
3436

d913611605 · Play with barriers · Updated 2024-09-25 11:04:11 -05:00    jdelony

4666
3437

0bade93228 · Update IQ1_TN and IQ2_TN bpw shown to user · Updated 2024-09-25 05:27:39 -05:00    jdelony

4666
3443

95d9f3c103 · Use fp32 for K*Q in Metal FA implementation · Updated 2024-09-25 05:04:10 -05:00    jdelony

4666
3434

75ac624a7a · Fix warnings in iqk_quantize.cpp · Updated 2024-09-17 06:22:37 -05:00    jdelony

4666
3435

5065dcd4a0 · Playing with hsums · Updated 2024-09-17 04:12:54 -05:00    jdelony

4666
3435

8e80d15930 · Faster BF16 Metal dot product · Updated 2024-09-16 10:32:48 -05:00    jdelony

4666
3432

e6d3b6b277 · iqk_mul_mat(ARM_NEON): adding bf16 support · Updated 2024-09-16 08:40:18 -05:00    jdelony

4666
3430

6bfd4511f9 · Adapt to latest master · Updated 2024-09-14 11:58:39 -05:00    jdelony

4666
3430

698c2094bb · Improve Q5_0 performance · Updated 2024-09-14 09:19:27 -05:00    jdelony

4666
3427