9662 Commits

Author SHA1 Message Date
Todd Malsbary
4196b477da
sycl : Make GGML_SYCL_F16=ON the default (#23996)
* Add -cl-fp32-correctly-rounded-divide-sqrt to F16=ON builds

Signed-off-by: Todd Malsbary <todd.malsbary@intel.com>

* Make GGML_SYCL_F16=ON the default

Signed-off-by: Todd Malsbary <todd.malsbary@intel.com>

* Leave F32 the default

F16 remains explictly set for example and Dockerfile builds.

Signed-off-by: Todd Malsbary <todd.malsbary@intel.com>

* Revert changes to examples/sycl/build scripts

Signed-off-by: Todd Malsbary <todd.malsbary@intel.com>

---------

Signed-off-by: Todd Malsbary <todd.malsbary@intel.com>
2026-06-16 08:34:02 +03:00
Pascal
ad39ccaa19
vulkan: add col2im_1d op (#24425)
* vulkan: add GGML_OP_COL2IM_1D, follow-up to the CPU op

* vulkan: col2im_1d bounded gather loop instead of full-K scan with modulo

* vulkan: col2im_1d address review from @jeffbolznv

* vulkan: col2im_1d return nullptr for unsupported types, address review from @0cc4m
b9661
2026-06-16 06:34:43 +02:00
Tarek Dakhran
7dad2f1a17
chat : fix LFM2 tool-call parsing double-escaping (#24667)
* Add escape test cases

* chat : fix LFM2 tool-call parsing double-escaping
b9660
2026-06-15 22:10:09 +02:00
Xuan-Son Nguyen
e36a602ba3
mtmd: fix miscounting n_tokens (#24656) b9659 2026-06-15 18:07:14 +02:00
Piotr Wilkin (ilintar)
38d546330a
chat: include full unparsed prompt in debug (#24650)
message on parse error
b9658
2026-06-15 17:33:54 +02:00
Julien Jerphanion
a1eb756c0b
docs: Add instructions to install llama.cpp from conda-forge (#22219)
* docs: Add instructions to install `llama.cpp` from conda-forge

Signed-off-by: Julien Jerphanion <git@jjerphan.xyz>

* Rewording of instructions

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>

---------

Signed-off-by: Julien Jerphanion <git@jjerphan.xyz>
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2026-06-15 17:12:25 +02:00
Pascal
581e8eca8b
chat: harden peg-native tool call parsing (#24329)
* chat: harden peg-native tool call parsing

accept an optional leading type: function field in
build_json_tools_flat_keys so openai style tool calls parse on
templates whose serialization opens on the name field.

return a clean error and log the unparsed fragment on a final peg
parse failure instead of throwing the raw parser position and input.

keep the raw arguments string in func_args_not_string when it is not
valid json instead of aborting the prompt render.

* chat: surface peg-native parse failures

a final peg parse failure threw the raw parser position and input. log
the unparsed fragment and raise a clearer error instead, so a model
output that does not match the expected format no longer fails silently
with an empty assistant turn.

minimal change, no behavior change on successful parses.

* chat: handle openai style tool calls in peg-native

* nits

* common: scope OpenAI wrapper grammar trigger via autoparser flag

* chat: gate type:function parsing leniency on the analysis flag

Thread accept_openai_wrapper from the generator to build_json_tools_flat_keys
so the leading "type": "function" field is accepted only when openai_wrapper_trigger is set.
b9656
2026-06-15 15:37:04 +02:00
Piotr Wilkin (ilintar)
0ae3f450f0
chat: fix an "oldie but goodie" grammar generator bug that surfaced during last changes (#24653)
* chat: fix an "oldie but goodie" grammar generator bug that surfaced during last changes

* update erroneous case in PEG parser test
b9655
2026-06-15 15:27:47 +02:00
Georgi Gerganov
e3cab403bf
mtmd : add post-decode callback (#24645)
Assisted-by: pi:llama.cpp/Qwen3.6-27B
b9654
2026-06-15 16:02:05 +03:00
Jeff Bolz
9dbc6621ae
vulkan: support more CONCAT types (#24579) b9653 2026-06-15 13:19:21 +02:00
Andrei
6eab47181c
wasm : fix fallback symbol collision (#24639) b9652 2026-06-15 10:11:59 +03:00
Katostrofik
e3bb1add8c
SYCL: use native subgroup size for K-quant DMMV (#21700) b9651 2026-06-15 10:10:53 +03:00
someoneinjd
d8a3f523c8
sycl: fix soft_max_f32 max reduction (#24451) b9650 2026-06-15 10:10:12 +03:00
Neo Zhang
72be44f1d2
sycl : fix reorder function; add fp32/fp16 in build script (#24578) b9649 2026-06-15 10:08:34 +03:00
Neo Zhang
8872ab5467
sycl : enhance set_rows to support q1_0, mxfp4, nvfp4 (#24564) 2026-06-15 10:01:40 +03:00
Neo Zhang
987fbd821d
[SYCL] add to support pool_1d, move pool_1d/2d code to pool.cpp/hpp (#24584)
* add to support pool_1d, move pool_1d/2d code to pool.cpp/hpp

* update ops.md
b9647
2026-06-15 10:01:07 +03:00
Alexey Kopytko
c035ff4902
[SYCL]: Remove per-allocation Level Zero runtime checks (#23399)
* [SYCL] Centralize Level Zero detection in ggml_sycl_init

* use the same wording

* get back the warning

* [SYCL] Remove per-allocation getenv() for GGML_SYCL_ENABLE_LEVEL_ZERO

* bring back the comment

* move it up to make sure devices call the shots

* move the env detection early

* replace g_ggml_sycl_enable_level_zero with a direct call to .ext_oneapi_level_zero

* update the comment

* switch back to g_ggml_sycl_enable_level_zero with a sentinel

* remove the check

* Reduce the diff

* reword, move lower

* move things aroudn

* remove forward declaration if favor of a full replace

* pre-cache results of zeDeviceGetProperties

* put ggml_sycl_get_env back

* replace get_sycl_env with ggml_sycl_get_env

* add whitespace back

* Apply suggestion from @sanmai
b9646
2026-06-15 09:58:42 +03:00
Georgi Gerganov
272088b9f2
metal : add repeat bf16 (#24638) b9645 2026-06-15 09:57:16 +03:00
Piotr Wilkin (ilintar)
a6dff71270
chat: fix whitespace problems once and for all (#24624)
* chat: fix whitespace problems once and for all

* Purge trailing spaces from grammar generation

* Revert "Purge trailing spaces from grammar generation"

This reverts commit b0827ecb7d4767f37cefd751b3646f98d5303891.
b9644
2026-06-15 08:27:10 +02:00
Pascal
2a6c391a5e
UI/svg block rendering (#24080)
* ui: add svg block visualizer based on allozaur's mermaid PR

* ui: rationalise diagram block styling and pre transforms shared by mermaid and svg

* ui: live render streaming svg blocks

* ui: also render svg authored in xml code fences

* ui: refactor svg block rendering, address review from allozaur

- Move the svg size ceiling and DOMPurify config out of sanitize-svg.ts into /constants.
- Rename the svg-diagram class to svg-block so the name no longer implies diagrams only.
- Replace the svg, xml and svg tag magic strings in the markdown pipeline with shared constants.
- Promote the data-svg-rendered marker and its sibling data attributes to constants.

* ui: render svg blocks in a shadow root for animation and live zoom

Mount each sanitized svg inside an open shadow root so author <style> and
keyframe or smil animations run while staying scoped to the host element.
Relax the sanitizer to forbid only foreignObject and script, which lets
animation, href and external resource refs through for wider compatibility.
Render the inline block and the zoom dialog from the same reactive source,
so a streaming svg keeps drawing live inside the open zoom popup.
2026-06-15 08:11:36 +02:00
leonardHONG
3686e9d643
CUDA: only support F32/F16 for GGML_OP_REPEAT (#24533) b9642 2026-06-15 09:11:00 +03:00
Masashi Yoshimura
6e9007ae61
ggml-webgpu: improve i-quants mul_mat performance and speed up prefill (#24530)
* Improve prefill speeds for i-quants

* Fix #if defined() usage in preprocessor guards.
b9641
2026-06-14 18:15:30 -07:00
Sigbjørn Skjæret
dd4623a74f
convert : fix lora base model arch retrieval (#24621) 2026-06-15 00:55:26 +02:00
franitel
ef8268feee
fix(ui): render thinking/reasoning block content as markdown (#24611)
* fix(ui): render thinking/reasoning block content as markdown

* feat(ui): add toggle setting for thinking block markdown rendering
2026-06-14 22:56:56 +02:00
Nicolas Mowen
5f04dc7ac3
ui: Add HEIC/HEIF image support (#24137)
* Add boilerplate for file types

* Add heic-to and implement conversion

* Load heic library from CDN

* Use jpg instead of png for conversion

* Move const to constants file
2026-06-14 20:42:16 +02:00
Piotr Wilkin (ilintar)
aedb2a5e9c
chat: add dedicated Cohere2MoE (North Code) parser (#24615)
* chat: add dedicated Cohere2MoE (North Code) parser

* Some renames to make @CISC happy :>
b9637
2026-06-14 20:17:40 +02:00
Mohammad Athar
8edaca9034
docs : fix typos in CUDA-FEDORA.md and grammars/README.md (#24459) 2026-06-15 01:33:38 +08:00
Alexander Batischev
20c5266f8a
docker: specify registry to simplify Podman builds (#24607) 2026-06-15 01:27:20 +08:00
Pascal
fd5869fb62
UI/mobile keyboard and pwa popup fixes (#24610)
* ui: make mobile layout keyboard-aware via interactive-widget and dvh shell anchor

* ui: fix duplicate PWA refresh popup by scoping the storage check to non-PWA pages
2026-06-14 18:35:00 +02:00
Amos Wong
1fd6dfe9f3
ui : fix ui clipping in mobile due to incorrect height setup (#24605) 2026-06-14 16:15:51 +02:00
Sigbjørn Skjæret
acd79d603c
jinja : add count/d/e filter aliases (#24606) b9632 2026-06-14 15:07:31 +02:00
Michael Wand
6e14286eda
cli : fix not copying preserved tokens (#24258) b9631 2026-06-14 11:52:15 +02:00
Bartowski
8ed274ef46
Add cohere2moe to llama-vocab for TINY_AYA (#24601) b9630 2026-06-14 09:04:46 +02:00
Sigbjørn Skjæret
46722116b9
ci : use CUDA label for cuda backend (#24594) 2026-06-14 08:27:52 +02:00
Sigbjørn Skjæret
c2ba3e47a2
add sycl to check-release (#24583) b9628 2026-06-14 09:42:26 +08:00
Aldehir Rojas
53bd47ea5b
ui : fix llama-ui-embed crash when no asset dir is given (#24597) b9627 2026-06-13 17:53:30 -05:00
Michael Wand
4988f6e866
Add arch support for cohere2-MoE (#24260)
* Add arch support for cohere2-MoE

* Removed redundant gating_func checks

* Changed ffn lookup to prefer prefix_dense_intermediate_size

* Renamed arch to cohere2moe

* Removed redundant lmhead check and chat template changes

* Removed lm_head.weight check from modify tensors, load output tensor not required, fallback to token_embd.weight

* Changed to (routed+shared)*0.5 for shared expert combined avg

* fixed sliding_window_pattern issue and pattern

* Fixed transformers crash 'first_k_dense_replace' error

* Remove comment

* Removed cohere2-moe as a tokenizer type and kept as tiny_aya.  Renamed North-Mini-Code-1.0.

* Fixed MTP fail, changed to use iSWA

* Fixed remaining todos: cohere2moe renamed, changed swa parsing to use get_key_or_arr, removed extra get_arr use

* Force metadata usage

Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>

* Remove Cohere2 checkpoint comment

Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>

* Remove MTP comment

Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>

* Regenerate cohere2moe tokenizer hash

* Add cohere2moe to Llama Model Saver supported list

* Check for zerobios tensors and add support for Command to use LayerNorm

* Map expert_selection_fn to sigmoid in base.py instead of command.py

* use bools for foundnorm/foundnormrms

Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>

---------

Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>
b9626
2026-06-13 19:49:00 +02:00
Sigbjørn Skjæret
f05cf4676a
jinja : fix negative step slice with start/stop values (#24580) b9625 2026-06-13 18:28:40 +02:00
Xuan-Son Nguyen
e8067a8b36
ui: build-time gzip compression (#24571)
* ui: keep original file name and path

* fix nocache

* ui: build-time gzip compression
b9624
2026-06-13 16:57:27 +02:00
Sigbjørn Skjæret
341babcf73
jinja : fix split and replace with empty first arg (#24574)
* fix split and replace with empty first arg

* fix reserve size
b9623
2026-06-13 16:56:59 +02:00
Jeff Bolz
1a7718b4c5
vulkan: support non-contig unary/glu ops (#24215)
* vulkan: support non-contig unary/glu ops

Change unary/glu ops to pass in all strides and use fastdiv for the index
calculation. Put all unary ops in one file, similar to glu, to share the
code. codex went ahead and added expm1 without me asking, but I had to
make it do a real precision analysis rather than just making stuff up.

unary.comp initially couldn't use generic_unary_head because there wasn't
space for xielu's additional constants. Fixing this required packing the
fastdiv 'L' values.

* attempt to workaround compiler bug

* resolve conflict from #23991

* use expm1
b9622
2026-06-13 08:44:15 -05:00
Xuan-Son Nguyen
597b6672e8
ui: keep original file name and path (#24568)
* ui: keep original file name and path

* fix nocache
b9621
2026-06-13 14:31:41 +02:00
Xuan-Son Nguyen
57fe1f07c3
server: clean up static assets handling (#24550)
* server: clean up static assets handling

* nits

* simplify file name handling, use static file name everywhere

* cmake/ui : bundle UI assets in an archive

* ui : run prettier on post-build.js

---------

Co-authored-by: Alde Rojas <hello@alde.dev>
b9620
2026-06-13 11:51:20 +02:00
Georgi Gerganov
d8a24ccee2
fit : wrap llama_device_memory_data (#24522) b9619 2026-06-13 08:09:52 +03:00
Muhammad Salem
c34b92235b
fix sycl links in release notes (#24527)
* fix sycl links in release notes

* remove extra line
2026-06-13 08:37:55 +08:00
Xuan-Son Nguyen
e37abd6b5f
mtmd: add batching API (#24384)
* mtmd: add batching API

* wip

* first working version (gemma4v)

* add arg

* nits

* wire up support_batch()

* fix 0.0 output embd

* fix audio

* nits

* refactor a bit

* nits

* fix non-batching case

* fix comment
2026-06-13 00:10:29 +02:00
Sigbjørn Skjæret
f58bad4137
ci : unbreak release harder (#24545)
* unbreak release harder

* missed one

* remove missing test for now
b9616
2026-06-12 23:49:36 +02:00
Sigbjørn Skjæret
cd5044661c
ci : unbreak release (#24544) 2026-06-12 23:29:49 +03:00
Georgi Gerganov
ebc10770ac
server : fix reasoning budget WebUI precedence over model.ini (#24517)
When reasoning-budget is set in model.ini, the per-request
thinking_budget_tokens from the WebUI was ignored because the
model.ini value took unconditional precedence.

Swap the precedence so the WebUI per-request value is checked
first, with the model.ini value serving as a fallback default.

Assisted-by: pi:llama.cpp/Qwen3.6-27B
2026-06-12 17:59:56 +03:00
Ruben Ortlam
3e7bd4f39a
vulkan: add pipeline barriers for memcpy read operations (#23770)
* vulkan: add pipeline barriers for memcpy read/write operations

* remove unnecessary host write pipeline barriers
2026-06-12 16:43:50 +02:00