998 Commits

Author SHA1 Message Date
Saba Fallah
894bb27af3
mtmd: model: unlimited-ocr: converter + parity test (#24969) 2026-06-24 18:20:22 +02:00
Pascal
00139b660b
ui: loading bar below the model picker (#24931)
* ui: show model load progress on the selector trigger

Mirror the in-dropdown stage progress as a thin bar on the selector
trigger, so the active model's load percent stays visible when the menu
is closed. Same status gating and composite fraction as the dropdown
row, so both bars track the selected model in sync.

Suggested-by: Julien Chaumond <@julien-c>

* ui: show model load progress bar on the in-conversation model selector

* ui: tune model load indicator to a pulsing highlight (suggested by @ngxson)

Also wire the indicator onto the mobile sheet trigger, which was missing
it since mobile uses the sheet instead of the dropdown.

* ui: thin (@allozaur) pulsating (@ngxson) model load bar
2026-06-24 10:50:44 +02:00
Aleksander Grygier
ef9c13d4c2
ui: New Logo + Navigation cleanup & Mobile UI/UX improvements (#24897)
* chore: `npm audit fix --force`

* feat: Update sidebar toggle to use Logo

* refactor: Clean up favicon SVG

* feat: Refactor logo component and implement theme-aware favicon generation

* feat: Add configurable padding to generated PWA assets

* test: Add unit tests for writeThemeFavicons

* refactor: Componentization

* feat: WIP

* feat: WIP

* feat: WIP

* feat: Mobile UI

* feat: add SEARCH route constant

* feat: create SidebarNavigationSearchResults component

* refactor: use SidebarNavigationSearchResults in conversation list

* feat: enable mobile search navigation in sidebar actions

* feat: add mobile search route and page

* fix: prevent sidebar overflow on mobile viewports

* fix: Mobile sidebar

* feat: Mobile Search WIP

* feat: Mobile WIP

* feat: Add PWA standalone detection and refine mobile UI

* feat: Improve mobile layout, sidebar handling, and chat scrolling

* feat: Improve mobile sidebar visibility and iOS Safari chat spacing

* fix: Disable auto-scroll on mobile

* chore: Linting

* fix: Wrong condition

* feat: Mobile chat scroll

* refactor: WIP

* fix: Desktop initial scroll always working again

* fix: Partial fix for mobile auto-scroll / initial scroll

* fix: Desktop auto-scroll on initial load and during streaming

* fix: Mobile scrolling logic

* refactor: Clean up

* feat: Improve start UI

* feat: Add `delay` to `fadeInView`

* feat: Auto-scroll button

* refactor: Cleanup

* refactor: Extract chat dialogs and alerts into dedicated component

* refactor: Reorganize ChatScreen component structure and initialization

* feat: Improve auto-scroll after sending message

* feat: UI improvements

* fix: Settings link

* feat: UI improvements

* fix: better UI spacing

* fix: Remove unneeded logic

* fix: Chat Processing Info UI rendering

* feat: Improve mobile UI

* feat: UI improvement

* fix: Conditional transition delay for Chat Messages based on route from

* fix: Delay mobile sidebar collapse for smoother transitions

* fix: Mobile scroll down button + sidebar pointer events

* fix: Mobile UI

* fix: Auto scrolling

* fix: Implement dynamic height calculations for chat auto-scroll positioning and UI elements

* fix: Retrieve `autofocus` for Chat Form textarea

* fix: Use proper class

Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>

* refactor: extract scroll-to-bottom logic and fix message send flow

* fix: update viewport store usage and remove conflicting autofocus

* feat: add accessibility labels to scroll down button

* fix: correct HTML structure in sidebar empty states

* fix: dynamically toggle processing info visibility

* chore: remove commented exports and fix formatting

* fix

* fix: Mobile Chat Form Add Action Sheet interactions

---------

Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
2026-06-24 10:21:33 +02:00
kononnable
be4a6a63eb
server : check draft context creation error (#24922) 2026-06-23 16:56:50 +02:00
Xuan-Son Nguyen
75ad0b23ed
server: fix remote preset handling, add test (#24938)
* server: add test for remote preset

* fix remote preset handling

* fix

* fix test
2026-06-23 13:28:34 +02:00
Gabe Goodhart
a3900a6694
model: Granite Speech Plus (#24818)
* feat: Add conversion support for Granite Speech Plus

Branch: GraniteSpeechPlus
AI-usage: full (Bob, OpenCode + Qwen3.6-35b)
Signed-off-by: Gabe Goodhart <ghart@us.ibm.com>

* feat: Extend granite_speech to support plus multi-layer concatenation

Branch: GraniteSpeechPlus
AI-usage: draft (Bob, OpenCode + Qwen3.6-35b)
Signed-off-by: Gabe Goodhart <ghart@us.ibm.com>

* fix(conversion): Fix plural naming for feature_layers for audio

Branch: GraniteSpeechPlus
AI-usage: none
Signed-off-by: Gabe Goodhart <ghart@us.ibm.com>

* fix(mtmd): Align feature_layer usage and naming everywhere

Branch: GraniteSpeechPlus
AI-usage: none
Signed-off-by: Gabe Goodhart <ghart@us.ibm.com>

* style: Use fstring for log

Signed-off-by: Gabe Goodhart <ghart@us.ibm.com>

Co-authored-by: Xuan-Son Nguyen <thichthat@gmail.com>

---------

Co-authored-by: Xuan-Son Nguyen <thichthat@gmail.com>
2026-06-23 12:03:31 +02:00
Aldehir Rojas
73618f27a8
server: improve user message detection and create checkpoints at every user message (#24176)
* server : improve message span logic

* cont : cast size_t to int32_t in comparisons

* server : create checkpoints before every user msg

* chat : remove \n in gemma4 delimiters

* chat : merge msg delimiter structs into one

* cont : reword comment

* cont : initialize tokens in delimiter

* cont : add server_tokens::get_raw_tokens() for mtmd

* cont : move message finding to server_tokens and skip mtmd tokens

* cont : update cohere2moe parser

* cont : increase min-step to 8192 and always produce a chkpt for last user message
2026-06-23 08:27:28 +03:00
Matt Thompson
dec5ca5577
server : Add id to tool call responses api (#24882) 2026-06-22 23:03:12 +02:00
Mahdiou Diallo
9c0ac887f3
ui: Prioritize favorite models in model selection (#24766)
Updated model selection prioritization to include favorite models.
2026-06-22 21:00:21 +02:00
Xuan-Son Nguyen
721354fbdf
server: (router) move model downloading to dedicated process (#24834)
* server: real-time model load progress tracking via /models/sse

* update docs

* server: move model download to child process

* rm unused

* fix most problems

* clean up

* nit fixes

* fix test case

* do not detact() thread

* shorter MODEL_DOWNLOAD_TIMEOUT in test

* throttle
2026-06-22 18:24:04 +02:00
Xuan-Son Nguyen
6ee0f65793
server: refactor/generalize input file schema (#24299)
* server: refactor/generalize input file schema

* wire up input_video, accept raw base64

* nits

* nits (2)

* fix windows
2026-06-22 16:42:47 +02:00
Pascal
099b579acb
ui: model status and load progress via /models/sse feed (#24878)
* ui: model status and load progress via /models/sse feed

* ui: centralize SSE wire-format delimiters into shared constants for the chat and /models/sse parsers

* ui: type /models/sse event names as a ServerModelsSseEventType enum

Address review from allozaur
2026-06-22 15:55:30 +02:00
Pascal
d0f9d2e5ac
server: fix edit_file crash on append at end of file (line_start -1) (#24893)
line_start -1 normalized to n+1, so append inserted at lines.begin() + n + 1,
one past end() -> heap-buffer-overflow in vector::_M_range_insert.

Normalize -1 to n (insert at end()), restrict -1 to append mode and reject it
for replace/delete instead of silently clobbering the last line. Parenthesize
the insert offset so empty-file append computes the position as int first,
avoiding a transient begin() - 1 on a null vector data pointer.
2026-06-22 10:55:28 +02:00
Xuan-Son Nguyen
7c082bc417
server: fix report progress for loading spec models, add "stages" list (#24870)
* server: fix report progress for loading spec models, add "stages" list

* improve

* nits

* nits 2
2026-06-21 17:36:52 +02:00
Xuan-Son Nguyen
bddfd2b113
server: refactor batch construction (#24843)
* server: refactor batch construction

* wip

* wip 2

* wip 3

* wip 4

* add abort_all_slots

* handle batch full more carefully

* fix assert

* rm debug log

* small nits

* (debug) add timings

* debug: force llama_synchronize for accurate timings

* address comments

* disable DEBUG_TIMINGS
2026-06-21 14:16:11 +02:00
Xuan-Son Nguyen
0d135df48c
mtmd: fix mtmd_get_memory_usage (#24867) 2026-06-21 14:12:15 +02:00
Xuan-Son Nguyen
2f89acc2bc
mtmd: add load progress callback (#24865) 2026-06-21 13:40:52 +02:00
Xuan-Son Nguyen
bfa3219177
server: add "verbose" field to schema (#24864) 2026-06-21 13:03:14 +02:00
Xuan-Son Nguyen
d6d899580d
server: real-time model load progress tracking via /models/sse (#24828)
* server: real-time model load progress tracking via /models/sse

* update docs

* add mutex for notify_to_router

* correct docs
2026-06-21 11:58:14 +02:00
Aldehir Rojas
063d9c156e
common/peg : refactor until gbnf grammar generation (#24839)
* common/peg : refactor until gbnf grammar into an ac automaton

* cont : add a test with multiple strings

* cont : pad state with 0s so rules line up

* cont : clean up comments

* cont : use set everywhere

* cont : inline state num string padding

* cont : add a ref to PR

* cont : fix regression in server-tools.cpp
2026-06-20 21:15:06 -05:00
Matti4
e27f308597
server: avoid forwarding auth headers in CORS proxy (#24373)
* server: avoid forwarding auth headers in CORS proxy

* format

* fix test

* fix e2e test

---------

Co-authored-by: Xuan Son Nguyen <son@huggingface.co>
2026-06-20 15:34:47 +02:00
Xuan-Son Nguyen
2b686a9120
server: refactor child --> router communication (#24821)
* server: refactor child --> router communication

* fix wakeup case

* add docs

* improve update_status()

* nits
2026-06-20 01:02:26 +02:00
Adrien Gallouët
4b48a53b6c
server : optimize get_token_probabilities (#24796)
Use std::partial_sort to order only the requested top-n tokens instead
of the full vocabulary

    logprobs sort: vocab=128000 n_top=0 iters=100
    full    sort:   8555.6 us/op
    partial sort:    704.3 us/op

Signed-off-by: Adrien Gallouët <angt@huggingface.co>
2026-06-19 23:26:54 +02:00
Xuan-Son Nguyen
e475fa2b5f
mtmd, arg: fix utf8 handling on windows (#24779)
* mtmd, arg: fix utf8 handling on windows

* also fix ggml_fopen

* fix build fail

* also fix CLI
2026-06-19 22:28:38 +02:00
Xuan-Son Nguyen
175147e8f6
server: remove all internal mentions about "webui" (#24817) 2026-06-19 22:12:46 +02:00
Mikolaj Kucharski
fabde3bf51
arg: Add comment line support to --api-key-file (#23168) 2026-06-19 17:33:54 +02:00
Xuan-Son Nguyen
8c2d6f6475
server: add --agent arg, remove redundant webui naming compat (#24801)
* server: add --agent arg, remove redundant webui naming compat

* corrent env

* fix the test

* llama-gen-docs

* nits: wordings
2026-06-19 16:06:13 +02:00
Xuan-Son Nguyen
e2e7a9b2d0
mtmd: several bug fixes (#24784)
* mtmd: several bug fixes

* fix build

* fix gemma4ua

* add sanity check in get_u32()

* fix build (2)

* area() avoid overflow
2026-06-19 12:18:36 +02:00
Ruixiang Wang
b14e3fb90c
spec: support eagle3 for qwen3.5 & 3.6 (#24593)
* spec: support qwen3.5 & 3.6 eagle3 draft

* eagle3: Add deferred boundary checkpoints restore support for hybrid models

* apply suggestions

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>

* spec: adapt to API change

* spec: fix naming

* cont : add TODO

---------

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2026-06-19 13:08:50 +03:00
Xuan-Son Nguyen
159d093a43
server: fix non-bound n_discard value (ctx shifting) (#24786)
* server: fix non-bound n_discard value

* Update tools/server/server-context.cpp

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>

---------

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2026-06-19 10:53:44 +02:00
Georgi Gerganov
80452d65b9
server : consolidate slot selection into get_available_slot (#24755)
Absorb get_slot_by_id logic into get_available_slot so slot selection
is handled by a single function call. When a specific slot id is
requested, the LCP similarity check still runs to enable proper
prompt cache updates.

Assisted-by: pi:llama.cpp/Qwen3.6-27B
2026-06-19 09:22:34 +03:00
Xuan-Son Nguyen
db52540f73
mtmd: add batching support for internvl (#24775) 2026-06-19 01:16:16 +02:00
Reguna
40f3aafc45
server: add "X-Accel-Buffering": "no" header to streaming endpoints (#24774)
* server: add "X-Accel-Buffering": "no" header to streaming endpoints

This header tells Nginx (as a reverse proxy) to NOT buffer responses. (only affects streaming endpoints)
Without it, Nginx will break streaming with certain applications (notably the Pi coding harness).
2026-06-18 22:01:24 +02:00
Xuan-Son Nguyen
a6b3260a42
mtmd: add batching for mtmd-cli, add video tests (#24778) 2026-06-18 21:55:04 +02:00
Xuan-Son Nguyen
060ce1bf72
mtmd: refactor llava-uhd overview image handling (always use ov_img_first) (#24769)
* add dedicated "overview" for mtmd_image_preproc_out

* corrections

* correct (again)

* nits

* nits (2)
2026-06-18 18:53:49 +02:00
Kangjia Gao
7b6c5a2aed
docs: fix export-lora --lora-scaled syntax [no release] (#24703)
Assisted-by: Codex
2026-06-18 16:46:17 +02:00
Xuan-Son Nguyen
fe7c8b2414
server: (router) fix stopping_thread potentially hang (#24728)
* server: (router) fix stopping_thread potentially hang

* fix windows build
2026-06-18 15:41:09 +02:00
Xuan-Son Nguyen
e1efd0991d
server: add "schema" and validation (#24150)
* wip

* working

* correct some limits

* add field name to error message
2026-06-18 15:40:58 +02:00
Aarni Koskela
08023072ef
server : add last-5-seconds generation speed display (#24291)
* server : add last-5-seconds generation speed display

* cont : clean-up

---------

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2026-06-18 14:02:20 +02:00
Amos Wong
20832179e2
ui: provide touch accessible model selection UI (#24604)
* ui : add model selector storybook stories

Covers list, favorites, single-model, all status states
(loading/loaded/sleeping/failed/idle), and selection states.

* ui : improve model selector mobile UX with hover media queries

Use @media (hover:none) to show action buttons directly on touch
devices and color-code them by model status (amber=sleeping,
green=loaded, muted=idle). Status dots hidden on touch. Desktop
hover behavior unchanged.
2026-06-18 13:14:20 +02:00
Anuj Attri
10786217e9
server : return HTTP 400 on invalid grammar (#24144) (#24154)
Throw on grammar parse failure so the server returns HTTP 400
instead of silently dropping the constraint.
Add a regression test for the invalid-grammar response.

Fixes #24144
2026-06-18 12:49:14 +02:00
Xuan-Son Nguyen
552258c535
server: (router) rework -hf preset repo (#24739)
* server: temporary remove HF remote preset

* rework remove preset.ini support

* rm unused get_remote_preset_whitelist()

* print warning

* add docs

* rm stray file
2026-06-18 12:45:23 +02:00
Xuan-Son Nguyen
968c43891a
server: fix router args not being forwarded to child instances (#24760) 2026-06-18 12:15:46 +02:00
Xuan-Son Nguyen
24bba7b98e
mtmd: refactor preprocessor, add mtmd_image_preproc_out (#24736)
* add mtmd_image_preproc_out

* add dev docs

* remove unused clip API

* rm unused clip_image_f32_batch::grid

* change preprocess() call signature
2026-06-18 12:04:39 +02:00
Aleksander Grygier
0b73fc79fe
ui: Update code formatting command in pre-commit hook (#24685) 2026-06-18 08:33:50 +02:00
Xuan-Son Nguyen
f3e1828164
mtmd: llava_uhd should no longer use batch dim (#24732) 2026-06-17 22:40:50 +02:00
Xuan-Son Nguyen
4b4d13ae72
server: (router) add model management API (#23976)
* wip

* server: (router) add SSE realtime updates API

* nits

* wip

* add download API

* add download api

* update docs

* add delete endpoint

* fix std::terminate

* fix crash

* fix 2

* add tests

* nits
2026-06-17 18:04:58 +02:00
Julien Chaumond
8086439a4c
webui: export conversations as jsonl (#24688)
* webui: export conversations as jsonl

each session is one jsonl file, a session header line followed by one line per message
exporting multiple conversations bundles them into a zip, one jsonl file each

* webui: import jsonl and zip conversation exports

parse the new jsonl session format and zip archives on import
keep supporting the legacy json format
2026-06-17 13:25:47 +02:00
Harapan Rachman
bae36efa30
UI : fix SSE transport detection and routing through CORS proxy. Assi… (#24500)
* UI : fix SSE transport detection and routing through CORS proxy. Assisted-by: Antigravity

* ui : replace magic strings with constants in MCP transport handling
2026-06-17 08:26:30 +02:00
Pascal
c1304d7b28
ui: add source toggle to mermaid and svg blocks (#24652)
* ui: add source toggle to mermaid and svg blocks

Add a toggle button next to copy and preview that switches a rendered
mermaid or svg block to its source code and back. The button is shared by
both block types and the rendered view stays the default.

The source view reuses the code block scroll container and the highlighted
code element captured at transform time, so it matches the app code blocks
without highlighting again.

Make tall diagrams scroll like text code blocks: safe centering keeps the
diagram centered when it fits and falls back to start alignment when it
overflows, so the top stays reachable instead of clipping above.

Keep the block header opaque and layered above the scrolled diagram, and
ignore header clicks in the zoom handler, so a button click never falls
through to the zoom dialog.

* ui: transparent diagram block header, address review from @allozaur
2026-06-16 14:14:22 +02:00