ik_llama.cpp

mirror of https://github.com/ikawrakow/ik_llama.cpp.git synced 2026-06-28 04:30:15 -05:00

History

Nexes the Elder 9eaf86a7c7

Fix minor CUDA discrepencies (#2005 )

* CUDA : typo

* CUDA: Add missing GGML_CALL to function definition

* CUDA: only log GGML_CUDA_FORCE_MMQ/CUBLAS when enabled

* CUDA: Fix softcap bug in flash_attn_tile_ext_f16

The else branch (softcap != 0) incorrectly called launch_fattn_tile_f16_64_128
with use_softcap=false instead of true, causing logit softcap to be silently
ignored for the col_per_block=32, parallel_blocks=1 path.

2026-06-23 09:37:48 +02:00

ggml-alloc.h

Merge mainline llama.cpp (#3 )

2024-07-27 07:55:01 +02:00

ggml-backend.h

Fix DFlash oerformance with split mode graph (#1980 )

2026-06-17 18:40:02 +02:00

ggml-cann.h

Merge mainline llama.cpp (#3 )

2024-07-27 07:55:01 +02:00

ggml-cpp.h

Port mdmd from mainline + Qwen2/2.5-VL support (#798 )