llama.cpp

mirror of https://github.com/ggml-org/llama.cpp.git synced 2026-06-27 23:50:20 -05:00

History

ggml-cpu: support K tails in power10 Q8/Q4 MMA matmul (#24753 )

* ggml-cpu: support K tails in Power10 MMA Q8/Q4 matmul

This patch removes the requirement that K be divisible by kc in the tinyBlas_Q0_PPC tiled matmul path. Process the final K panel using its actual depth and pass the reduced panel size through packing and kernel execution.  This allows more workloads to use the MMA kernel and reduces fallback to mnpack.

* Apply suggestion from @taronaeo

Co-authored-by: Aaron Teo <taronaeo@gmail.com>

---------

Co-authored-by: Aaron Teo <taronaeo@gmail.com>

2026-06-19 08:55:38 +03:00

cmake

ggml : Parallelize quant LUT init (#23595 )

2026-05-25 10:15:46 +03:00

include

Remove padding and multiple D2D copies for MTP (#24086 )

2026-06-10 23:21:16 +05:30

src

ggml-cpu: support K tails in power10 Q8/Q4 MMA matmul (#24753 )

2026-06-19 08:55:38 +03:00

.gitignore

vulkan : cmake integration (#8119 )

2024-07-13 18:12:39 +02:00

CMakeLists.txt

[SYCL] rename GGML_SYCL_SUPPORT_LEVEL_ZERO (#24719 )

2026-06-18 11:18:26 +03:00