llama.cpp

mirror of https://github.com/ggml-org/llama.cpp.git synced 2026-06-27 23:50:20 -05:00

History

rehan-10xengineer 3c7450cee1

ggml-cpu: extend RVV quantization vec dot to higher VLENs (#22754 )

* ggml-cpu: add rvv 512b,1024b impls for iq4_xs

* ggml-cpu: refactor; add rvv 512b, 1024b impls for q6_K, i-quants

* ggml-cpu: refactor; add 512 and 1024 implementations of tq3_s, iq3_xxs, iq2_s, iq2_xs, iq2_xxs

improve iq2_xs impl for rvv 256

Co-authored-by: Rehan Qasim <rehan.qasim@10xengineers.ai>

---------

Co-authored-by: taimur-10x <taimur.ahmad@10xengineers.ai>
Co-authored-by: Rehan Qasim <rehan.qasim@10xengineers.ai>

2026-06-04 08:03:40 +03:00

cmake

ggml : Parallelize quant LUT init (#23595 )

2026-05-25 10:15:46 +03:00

include

TP: quantized KV cache support (#23792 )

2026-06-01 12:30:10 +02:00

src

ggml-cpu: extend RVV quantization vec dot to higher VLENs (#22754 )

2026-06-04 08:03:40 +03:00

.gitignore

vulkan : cmake integration (#8119 )

2024-07-13 18:12:39 +02:00

CMakeLists.txt

ggml : bump version to 0.13.1 (ggml/1523)

2026-05-29 09:56:08 +03:00