llama.cpp

mirror of https://github.com/ggml-org/llama.cpp.git synced 2026-06-27 23:50:20 -05:00

History

opencl: improve get_rows, cpy, concat and q6_k flat gemv (#24160 )

* opencl: allow multiple workgroups for large rows

* opencl: improve small cpy

* opencl: packed concat for small input

* opencl: tweak flat q6_K gemv, increase N_DST and remap threads

2026-06-05 13:45:25 -07:00

cmake

ggml : Parallelize quant LUT init (#23595 )

2026-05-25 10:15:46 +03:00

include

TP: quantized KV cache support (#23792 )

2026-06-01 12:30:10 +02:00

src

opencl: improve get_rows, cpy, concat and q6_k flat gemv (#24160 )

2026-06-05 13:45:25 -07:00

.gitignore

vulkan : cmake integration (#8119 )

2024-07-13 18:12:39 +02:00

CMakeLists.txt

ggml : bump version to 0.13.1 (ggml/1523)

2026-05-29 09:56:08 +03:00