llama.cpp

mirror of https://github.com/ggml-org/llama.cpp.git synced 2026-06-27 23:50:20 -05:00

Author	SHA1	Message	Date
Pascal	1a87dcdc45	server + ui: SSE Replay Buffer (#23226 ) * server: SSE replay buffer, survives client disconnect Opt in on POST /v1/chat/completions when the client sends X-Stream-Resume: 1 and a non empty X-Conversation-Id. The conv id is the session identity end to end, no extra opaque token. The drain runs detached server side and buffers SSE bytes, the generation survives HTTP disconnect, F5, or lets users switch from iOS Safari to another app without losing the actively generated response. Routes: GET /v1/stream/<conv_id>?from=N replay GET /v1/streams[?conversation_id=X] list, drives sidebar spinners DELETE /v1/stream/<conv_id> Stop, idempotent Router parent fans out to children for list and delete, probes on GET to route to the owner, fans out DELETE on POST so "one session per conv" holds across model swaps. WebUI: the layout snapshots /v1/streams at mount and on visibilitychange, the sidebar reflects live inferences across all convs. The chat page reattaches on mount, append vs fresh is detected from existing content so continue mid stream keeps its prefix. update_slots: on llama_memory_seq_rm refusal at a deep position, full clear of the seq and reprefill from zero instead of GGML_ABORT. OAI strict path unchanged when the opt in headers are absent. * server: create stream session only after post_tasks succeeds * server, ui: drop X-Stream-Resume, X-Conversation-Id alone enables the replay buffer * server: drop magic 17, derive the X-Conversation-Id header length from sizeof at build time * refactor: address review feedback from ngxson * server-context: cleaning * server-stream: fix use-after-free on rd Guard stop_producer with a shared alive flag, flipped by on_stream_end before rd dies. Prevents a late cancel (session eviction by a later POST on the same conv_id, or a DELETE arriving after the producer ended) from touching a destroyed rd. * ui: fix cross-conversation contamination Scope streaming flags per conv so one finishing does not unflag the others, guard discoverActiveStream against concurrent runs to avoid duplicate attaches, and stop racing syncRemoteRunningStreams for the sidebar set. * server-http: keep request alive in detached SSE drain The response next() lambda may reach into request via &req long after on_complete reset the request shared_ptr. Capture request in the detached thread so it outlives the drain. ui: address review feedback from coder543 Forward Authorization to /v1/stream and /v1/streams fetches, the resumable routes must obey --api-key like the rest of the API. Wrap reader.read() in a try/catch, the underlying connection drop rejects with TypeError instead of resolving done=true, treat it as a premature end of stream so the existing resume loop kicks in. Freeze the model at session start in chatStreamingStates.model and thread it through cancel and resume, the dropdown selection may have changed since the POST and the server side identity is fixed at that time. * format * ui: remove unused selectedModelName * server-stream: poll session->is_cancelled() in stream_aware_should_stop Address review feedback from coder543. The cancel propagation through rd.stop() relies on the slot eventually processing the cancel task and posting a result that notifies the recv condvar, remove_waiting_task_ids does not notify directly. Add a defensive poll on session->is_cancelled() so the producer-side next() loop exits on its next iteration after cancel() without waiting for the cancel task to round trip through a slot. * server-stream, ui: replace GET /v1/streams with POST /v1/streams/lookup Address review feedback from coder543. Listing live sessions leaks the conversation_id of every concurrent user, which defeats the random UUID unguessability. The new route takes {conversation_ids: [...]} in the body and returns matches only for the ids the caller already owns, so foreign UUIDs stay private. The router fans out the same POST to every child and aggregates, the WebUI passes the convs visible in its sidebar. * ui: read conv ids from IndexedDB in syncRemoteRunningStreams The conversations store is not hydrated yet at +layout onMount, so the sidebar spinners stayed off for background convs until the user clicked on them. Read straight from the DB to dodge the init race. * server-models: deduplicate stream lookup timeouts behind one constant * ui: extract visibility kick grace into a stream constant, bump to 1000 ms * make it safer & more simple * server-stream: survive client disconnect via stream_pipe::finish_producer After the RAII rewrite the generation stopped the moment the client disconnected. httplib bails its content provider on the is_peer_alive check at the top of write_content_chunked, so returning true from the provider never keeps it producing: the response resets, rd is destroyed and its task gets cancelled. Reinstate the disconnect survival inside the pipe. stream_pipe gains finish_producer, which pumps the response next() into the ring buffer until the generation ends, and mark_producer_done for the clean wire end. server-http only triggers them: mark before sink.done on a clean close, finish in on_complete when the peer left early. No detach, no stream logic in server-http beyond the trigger, and the strict OAI path is untouched when no pipe is attached. Known limitation: finish_producer pumps synchronously on the http worker, so a disconnected stream keeps its worker busy until the generation ends. A follow-up will move the drain off the http worker so no worker is held. * server-stream: drain disconnected streams on a manager owned thread The previous commit pumped the post disconnect drain synchronously in on_complete, on the http worker, so a disconnected stream kept its worker busy until the generation ended. Under a wave of reloads or tab closes that pins workers from the pool. Move the drain off the http worker. on_complete now hands the response to stream_session_manager::adopt_orphan, which pumps it to completion on a manager owned thread and releases the worker at once. One thread per disconnected stream still generating, stored in a list, joined and reaped on the next adopt, by the GC, and at shutdown. No detach, the thread lifecycle is fully owned by the manager. needs_drain gates the handoff so a cleanly finished stream never spawns a thread, and the strict OAI path stays untouched when no pipe is attached. stop_gc now cancels sessions before finalizing them, so an in flight drain sees is_cancelled and exits instead of blocking the shutdown join until the generation ends naturally. * ui: add missing JSDoc * server-stream: drain on the http worker, drop the manager thread Address @ngxson review: httplib runs a large dynamic pool and a worker blocked in next() sits on a condvar instead of burning cpu, so draining the rest of the generation on that worker is fine and much simpler than a dedicated thread. on_complete calls finish_producer directly again. Removes adopt_orphan, the orphan thread list and its reaping, the stop_gc session cancel that only existed to unblock those threads, and the now dead drain_shutdown flag. * server-stream: split stream_pipe into producer and consumer classes Address @ngxson review: one class covering both ends was messy. stream_pipe is now a base holding the session and is_cancelled, with stream_pipe_producer (write, mark_producer_done, finish_producer, cleanup, finalizes on destruct) and stream_pipe_consumer (read only, no finalize) deriving from it. Drops the is_producer_ discriminator and its runtime guards, the type now encodes the role. res.spipe is retyped to shared_ptr<stream_pipe_producer> since it is only ever a producer. No behavior change. * server-stream: rename producer methods to unix pipe semantics Address @ngxson review: mark_producer_done becomes done(), finish_producer becomes close(), matching a unix pipe write end. The producer_done_ member follows as done_. write() is unchanged. No behavior change. * server, ui: route resumable streams via a conv map, persist resume identity Address ngxson review: drop the polling probe, proxy_post records a conv_id -> model map and the stream routes resolve the owning child with one lookup. The map is the single source of truth, the ::model suffix stays for child session uniqueness but the router never parses it. UI: the server keys a session by the POST time identity (conv::model), but reload probed with the bare conv id and missed model tagged sessions, so F5 stopped the stream and sidebar spinners stayed off. Persist the model and rebuild the exact identity on resume, single conv and bulk sidebar both send it. Add unit coverage for the identity round trip. * ui: resolve continue target by id to stop cross-conversation flash on switch * ui: skip stream resume when the abort is intentional * server: move the conv id to model map into a self contained tracker Address review from ngxson: server_models held two mutexes side by side, the global one and a bare conv_model_mu guarding a loose map, which made the locking hard to follow. Wrap the map and its lock in a small conv_model_tracker struct that owns its mutex, one mutex per struct. The remember, lookup and forget methods move inline into the tracker, server_models exposes a single conv_models member and the routes call models.conv_models.lookup and friends. No behavior change, the map stays the single source of truth for routing resumable streams to a child. * ui: replace stream magic values with enums and shared constants Address review from allozaur: lift the inline literals around the resumable stream code into named symbols so the intent is explicit and reusable. * ui: fold the stream resume and discovery helpers into ChatService Address review from allozaur: drop the two standalone stream-.service files. They were used only by the chat service and store, carried no shared state, and did not follow the static class pattern the other services use, so a separate abstraction was not warranted. Move the helpers onto ChatService as static methods. No behavior change, tests now exercise them through ChatService. docs: document the SSE replay buffer in server README-dev Add the resumable streaming section, list stream_session_manager in the backend component inventory, and link PR 23226 in the related PRs. * ui: align attachServerStream call with onCompletionId param in handleStreamResponse * server-http: rename del_ to del to match get and post * ui: address review feedback from allozaur * ui: drop duplicate SSE constants, keep sse.ts canonical * ui: use svelte:document for the visibilitychange listener address review from allozaur: replace the manual document.addEventListener in onMount with a declarative <svelte:document onvisibilitychange>. svelte handles attach, detach and SSR, so the typeof document guard and the onMount cleanup go away. onMount keeps only the first load snapshot. * server: trim redundant stream drain comments Address review from ngxson * server: balance and clean up stream comments remove redundant comments and tighten the verbose ones across the resumable stream code, keeping the concurrency and lifetime rationale that is not obvious from the code. also fix two stale comments in server.cpp and server-models.h that still described the old ::model suffix probe and fan out routing, now replaced by the conv_id -> model map Address review from ngxson * ui: balance and clean up stream comments dedup repeated rationale (frozen conv::model identity, the lookup privacy note, the abort patterns) down to one canonical spot, tighten the verbose blocks, and keep the concurrency and resume-offset reasoning. fix stale comments in stream-identity.ts and chat.service.ts that still described the old loopback probe and fan out routing, now the conv_id -> model map. --------- Co-authored-by: Xuan Son Nguyen <son@huggingface.co>	2026-06-26 09:31:29 +02:00
Xuan-Son Nguyen	721354fbdf	server: (router) move model downloading to dedicated process (#24834 ) * server: real-time model load progress tracking via /models/sse * update docs * server: move model download to child process * rm unused * fix most problems * clean up * nit fixes * fix test case * do not detact() thread * shorter MODEL_DOWNLOAD_TIMEOUT in test * throttle	2026-06-22 18:24:04 +02:00
Xuan-Son Nguyen	d6d899580d	server: real-time model load progress tracking via /models/sse (#24828 ) * server: real-time model load progress tracking via /models/sse * update docs * add mutex for notify_to_router * correct docs	2026-06-21 11:58:14 +02:00
Xuan-Son Nguyen	2b686a9120	server: refactor child --> router communication (#24821 ) * server: refactor child --> router communication * fix wakeup case * add docs * improve update_status() * nits	2026-06-20 01:02:26 +02:00
Xuan-Son Nguyen	175147e8f6	server: remove all internal mentions about "webui" (#24817 )	2026-06-19 22:12:46 +02:00
Xuan-Son Nguyen	8c2d6f6475	server: add --agent arg, remove redundant webui naming compat (#24801 ) * server: add --agent arg, remove redundant webui naming compat * corrent env * fix the test * llama-gen-docs * nits: wordings	2026-06-19 16:06:13 +02:00
Xuan-Son Nguyen	4b4d13ae72	server: (router) add model management API (#23976 ) * wip * server: (router) add SSE realtime updates API * nits * wip * add download API * add download api * update docs * add delete endpoint * fix std::terminate * fix crash * fix 2 * add tests * nits	2026-06-17 18:04:58 +02:00
Xuan-Son Nguyen	06d26dfdff	download: add option to skip_download (#23059 ) * download: add option to skip_download * fix * fix 2 * if file doesn't exist, respect skip_download flag	2026-05-29 16:30:55 +02:00
Aleksander Grygier	59778f0196	ui: Restructure repo to use `tools/ui` folder and `ui` / `UI` / `llama-ui` / `LLAMA_UI` naming (#23064 ) * webui: Move static build output from `tools/server/public` to `build/ui` directory * refactor: Move to `tools/ui` * refactor: rename CMake variables and preprocessor defines - Rename LLAMA_BUILD_WEBUI -> LLAMA_BUILD_UI (old kept as deprecated) - Rename LLAMA_USE_PREBUILT_WEBUI -> LLAMA_USE_PREBUILT_UI (old kept as deprecated) - Backward compat: old vars auto-forward to new ones with DEPRECATION warning - Rename internal vars: WEBUI_SOURCE -> UI_SOURCE, WEBUI_SOURCE_DIR -> UI_SOURCE_DIR, etc. - Rename HF bucket: LLAMA_WEBUI_HF_BUCKET -> LLAMA_UI_HF_BUCKET - Emit both LLAMA_BUILD_WEBUI and LLAMA_BUILD_UI preprocessor defines - Emit both LLAMA_WEBUI_DEFAULT_ENABLED and LLAMA_UI_DEFAULT_ENABLED * refactor: rename CLI flags (--webui -> --ui) with backward compat - Add --ui/--no-ui (old --webui/--no-webui kept as deprecated aliases) - Add --ui-config (old --webui-config kept as deprecated alias) - Add --ui-config-file (old --webui-config-file kept as deprecated alias) - Add --ui-mcp-proxy/--no-ui-mcp-proxy (old --webui-mcp-proxy kept as deprecated) - Add new env vars: LLAMA_ARG_UI, LLAMA_ARG_UI_CONFIG, LLAMA_ARG_UI_CONFIG_FILE, LLAMA_ARG_UI_MCP_PROXY - C++ struct fields: params.ui, params.ui_config_json, params.ui_mcp_proxy added alongside old fields - Backward compat: old fields synced to new ones in g_params_to_internals * refactor: update C++ server internals with backward compat - Rename json_webui_settings -> json_ui_settings (both kept in server_context_meta) - Rename params.webui usage -> params.ui (both synced, old still works) - JSON API emits both "ui"/"ui_settings" and "webui"/"webui_settings" keys - Server routes use params.ui_mcp_proxy \|\| params.webui_mcp_proxy - Preprocessor guards use #if defined(LLAMA_BUILD_UI) \|\| defined(LLAMA_BUILD_WEBUI) * refactor: rename CI/CD workflows, artifacts, and build script - Rename webui-build.yml -> ui-build.yml; artifact webui-build -> ui-build - Rename webui-publish.yml -> ui-publish.yml; var HF_BUCKET_WEBUI_STATIC_OUTPUT -> HF_BUCKET_UI_STATIC_OUTPUT - Rename server-webui.yml -> server-ui.yml; job webui-build/checks -> ui-build/checks - Update server.yml: job/artifact refs webui-build -> ui-build - Update release.yml: all webui-build/publish refs -> ui-build/publish; HF_TOKEN_WEBUI_STATIC_OUTPUT -> HF_TOKEN_UI_STATIC_OUTPUT - Update server-self-hosted.yml: webui-build -> ui-build - Update build-self-hosted.yml: HF_WEBUI_VERSION -> HF_UI_VERSION - Rename webui-download.cmake -> ui-download.cmake (internal refs updated) - Update labeler.yml: server/webui -> server/ui path label * docs: update CODEOWNERS and server README docs - Update CODEOWNERS: team ggml-org/llama-webui -> ggml-org/llama-ui, path /tools/server/webui/ -> /tools/ui/ - Update server README.md: CLI tables show --ui flags with deprecated --webui aliases - Update server README-dev.md: "WebUI" -> "UI", paths updated to tools/ui/ * fix: Small fixes for UI build * fix: CMake.txt syntax * chore: Formatting * fix: `.editorconfig` for llama-ui * chore: Formatting * refactor: Use `APP_NAME` in Error route * refactor: Cleanup * refactor: Single migration service * make llama-ui a linkable target * fix: UI Build output * fix: Missing change * fix: separate llama-ui npm build output into build/tools/ui/dist subfolder + use cmake npm build instead of downloading ui-build.yml artifacts in CI * refactor: UI workflows cleanup --------- Co-authored-by: Xuan Son Nguyen <son@huggingface.co>	2026-05-16 02:02:40 +02:00
Xuan-Son Nguyen	7bfe120c21	mtmd, server, common: expose modalities to /v1/models (#22952 ) * mtmd, server, common: expose modalities to /v1/models * fix build * rename to mtmd_caps	2026-05-12 19:08:07 +02:00
Xuan-Son Nguyen	9dcf835528	server: (router) expose child model info from router's /v1/models (#22683 ) * server: (router) expose child model info from router's /v1/models * update docs	2026-05-08 14:42:15 +02:00
Xuan-Son Nguyen	935a340292	server: implement /models?reload=1 (#21848 )	2026-05-04 16:23:26 +02:00
tha80	983ca8992e	server: (router) Forward form-data to model server (Fixes #22044 ) (#22118 ) * This commit enables the router to forward form-data to model server. Fixes #22044 (enabling to use the /v1/audio/transcriptions in router mode) * * Applied the suggestion from Copilots first comment: using the non-throwing json::parse overload. * Addressed Copilots third comment by extending the files representation to also include filename and content-type * Addressed Copilots fourth comment by making the RNG thread_local * Changed variable body from std::string to std::ostringstream in build_multipart_body as suggested by ngxson in https://github.com/ggml-org/llama.cpp/pull/22118#discussion_r3127099053 * Added sanitize_field lambda in build_multipart_body for key, filename and content_type as suggested by ngxson in https://github.com/ggml-org/llama.cpp/pull/22118#discussion_r3127104647 * explicitly checking if value/item is string before calling value/item.get<std::string>() as requested by ngxson in https://github.com/ggml-org/llama.cpp/pull/22118#discussion_r3127111279 * Added double quote to the sanitize lambda and throw on json parse failure --------- Co-authored-by: Ralph Paßgang <ralph@trust-it.de>	2026-04-27 23:55:00 +02:00
Xuan-Son Nguyen	49bfddeca1	server: allow router to report child instances sleep status (#20849 ) * server: allow router to report child instances sleep status * refactor * move sleeping to state * nits	2026-03-22 18:33:52 +01:00
Evan Huus	23fbfcb1ad	server: Parse port numbers from MCP server URLs in CORS proxy (#20208 ) * Parse port numbers from MCP server URLs * Pass scheme to http proxy for determining whether to use SSL * Fix download on non-standard port and re-add port to logging * add test --------- Co-authored-by: Xuan Son Nguyen <son@huggingface.co>	2026-03-09 17:47:54 +01:00
Pascal	2e7e638523	server : support multiple model aliases via comma-separated --alias (#19926 ) * server : support multiple model aliases via comma-separated --alias * server : update --alias description and regenerate docs * server : multiple model aliases and tags - address review feedback from ngxson - --alias accepts comma-separated values (std::set, no duplicates) - --tags for informational metadata (not used for routing) - aliases resolve transparently in router via get_meta/has_model - /v1/models exposes aliases and tags fields * regenerate docs * nits * server : use first alias as model_name for backward compat address review feedback from ngxson * server : add single-model test for aliases and tags	2026-02-27 07:05:23 +01:00
Vladislav Sayapin	da143b9940	server : fix router child env in containerized environments (#18562 )	2026-01-05 14:12:05 +01:00
wbtek	5b1248c9af	server : Cmdline arg -to changes http read timeout from current 600sec default (#18279 ) * Prevent crash if TTFT >300sec, boosted to 90 days * server : allow configurable HTTP timeouts for child models * server : pass needed timeouts from params only --------- Co-authored-by: Greg Slocum <fromgit@wbtek.slocum.net>	2025-12-29 17:12:48 +01:00
Xuan-Son Nguyen	f5acfb2ffa	server: (router) add stop-timeout option (#18350 ) * server: (router) add stop-timeout option * also allow stop while loading * add docs * unload_lru: also wait for unload to complete	2025-12-24 23:47:49 +01:00
Xuan-Son Nguyen	9e39a1e6a9	server: support load model on startup, support preset-only options (#18206 ) * server: support autoload model, support preset-only options * add docs * load-on-startup * fix * Update common/arg.cpp Co-authored-by: Pascal <admin@serveurperso.com> --------- Co-authored-by: Pascal <admin@serveurperso.com>	2025-12-20 09:25:27 +01:00
Xuan-Son Nguyen	98c1c7a7bf	presets: refactor, allow cascade presets from different sources, add global section (#18169 ) * presets: refactor, allow cascade presets from different sources * update docs * fix neg arg handling * fix empty mmproj * also filter out server-controlled args before to_ini() * skip loading custom_models if not specified * fix unset_reserved_args * fix crash on windows	2025-12-19 12:08:20 +01:00
Pascal	6ce3d85796	server: (webui) add --webui-config (#18028 ) * server/webui: add server-side WebUI config support Add CLI arguments --webui-config (inline JSON) and --webui-config-file (file path) to configure WebUI default settings from server side. Backend changes: - Parse JSON once in server_context::load_model() for performance - Cache parsed config in webui_settings member (zero overhead on /props) - Add proper error handling in router mode with try/catch - Expose webui_settings in /props endpoint for both router and child modes Frontend changes: - Add 14 configurable WebUI settings via parameter sync - Add tests for webui settings extraction - Fix subpath support with base path in API calls Addresses feedback from @ngxson and @ggerganov * server: address review feedback from ngxson * server: regenerate README with llama-gen-docs	2025-12-17 21:45:45 +01:00
Xuan-Son Nguyen	bde461de8c	server: (router) allow child process to report status via stdout (#18110 ) * server: (router) allow child process to report status via stdout * apply suggestions	2025-12-17 14:54:11 +01:00
Pascal	f32ca51bfe	server: add presets (config) when using multiple models (#17859 ) * llama-server: recursive GGUF loading Replace flat directory scan with recursive traversal using std::filesystem::recursive_directory_iterator. Support for nested vendor/model layouts (e.g. vendor/model/.gguf). Model name now reflects the relative path within --models-dir instead of just the filename. Aggregate files by parent directory via std::map before constructing local_model server : router config POC (INI-based per-model settings) * server: address review feedback from @aldehir and @ngxson PEG parser usage improvements: - Simplify parser instantiation (remove arena indirection) - Optimize grammar usage (ws instead of zero_or_more, remove optional wrapping) - Fix last line without newline bug (+ operator instead of <<) - Remove redundant end position check Feature scope: - Remove auto-reload feature (will be separate PR per @ngxson) - Keep config.ini auto-creation and template generation - Preserve per-model customization logic Co-authored-by: aldehir <aldehir@users.noreply.github.com> Co-authored-by: ngxson <ngxson@users.noreply.github.com> * server: adopt aldehir's line-oriented PEG parser Complete rewrite of INI parser grammar and visitor: - Use p.chars(), p.negate(), p.any() instead of p.until() - Support end-of-line comments (key=value # comment) - Handle EOF without trailing newline correctly - Strict identifier validation ([a-zA-Z_][a-zA-Z0-9_.-]) - Simplified visitor (no pending state, no trim needed) - Grammar handles whitespace natively via eol rule Business validation preserved: - Reject section names starting with LLAMA_ARG_ - Accept only keys starting with LLAMA_ARG_* - Require explicit section before key-value pairs Co-authored-by: aldehir <aldehir@users.noreply.github.com> * server: fix CLI/env duplication in child processes Children now receive minimal CLI args (executable, model, port, alias) instead of inheriting all router args. Global settings pass through LLAMA_ARG_* environment variables only, eliminating duplicate config warnings. Fixes: Router args like -ngl, -fa were passed both via CLI and env, causing 'will be overwritten' warnings on every child spawn * add common/preset.cpp * fix compile * cont * allow custom-path models * add falsey check * server: fix router model discovery and child process spawning - Sanitize model names: replace / and \ with _ for display - Recursive directory scan with relative path storage - Convert relative paths to absolute when spawning children - Filter router control args from child processes - Refresh args after port assignment for correct port value - Fallback preset lookup for compatibility - Fix missing argv[0]: store server binary path before base_args parsing * Revert "server: fix router model discovery and child process spawning" This reverts commit e3832b42eeea7fcb108995966c7584479f745857. * clarify about "no-" prefix * correct render_args() to include binary path * also remove arg LLAMA_ARG_MODELS_PRESET for child * add co-author for ini parser code Co-authored-by: aldehir <hello@alde.dev> * also set LLAMA_ARG_HOST * add CHILD_ADDR * Remove dead code --------- Co-authored-by: aldehir <aldehir@users.noreply.github.com> Co-authored-by: ngxson <ngxson@users.noreply.github.com> Co-authored-by: Xuan Son Nguyen <son@huggingface.co> Co-authored-by: aldehir <hello@alde.dev>	2025-12-10 22:18:21 +01:00
Pascal	5ceed62421	server: fix duplicate HTTP headers in multiple models mode (#17698 ) * llama-server: fix duplicate HTTP headers in multiple models mode (#17693) * llama-server: address review feedback from ngxson - restrict scope of header after std::move - simplify header check (remove unordered_set)	2025-12-03 10:28:43 +01:00
Xuan-Son Nguyen	ec18edfcba	server: introduce API for serving / loading / unloading multiple models (#17470 ) * server: add model management and proxy * fix compile error * does this fix windows? * fix windows build * use subprocess.h, better logging * add test * fix windows * feat: Model/Router server architecture WIP * more stable * fix unsafe pointer * also allow terminate loading model * add is_active() * refactor: Architecture improvements * tmp apply upstream fix * address most problems * address thread safety issue * address review comment * add docs (first version) * address review comment * feat: Improved UX for model information, modality interactions etc * chore: update webui build output * refactor: Use only the message data `model` property for displaying model used info * chore: update webui build output * add --models-dir param * feat: New Model Selection UX WIP * chore: update webui build output * feat: Add auto-mic setting * feat: Attachments UX improvements * implement LRU * remove default model path * better --models-dir * add env for args * address review comments * fix compile * refactor: Chat Form Submit component * ad endpoint docs * Merge remote-tracking branch 'webui/allozaur/server_model_management_v1_2' into xsn/server_model_maagement_v1_2 Co-authored-by: Aleksander <aleksander.grygier@gmail.com> * feat: Add copy to clipboard to model name in model info dialog * feat: Model unavailable UI state for model selector * feat: Chat Form Actions UI logic improvements * feat: Auto-select model from last assistant response * chore: update webui build output * expose args and exit_code in API * add note * support extra_args on loading model * allow reusing args if auto_load * typo docs * oai-compat /models endpoint * cleaner * address review comments * feat: Use `model` property for displaying the `repo/model-name` naming format * refactor: Attachments data * chore: update webui build output * refactor: Enum imports * feat: Improve Model Selector responsiveness * chore: update webui build output * refactor: Cleanup * refactor: Cleanup * refactor: Formatters * chore: update webui build output * refactor: Copy To Clipboard Icon component * chore: update webui build output * refactor: Cleanup * chore: update webui build output * refactor: UI badges * chore: update webui build output * refactor: Cleanup * refactor: Cleanup * chore: update webui build output * add --models-allow-extra-args for security * nits * add stdin_file * fix merge * fix: Retrieve lost setting after resolving merge conflict * refactor: DatabaseStore -> DatabaseService * refactor: Database, Conversations & Chat services + stores architecture improvements (WIP) * refactor: Remove redundant settings * refactor: Multi-model business logic WIP * chore: update webui build output * feat: Switching models logic for ChatForm or when regenerating messges + modality detection logic * chore: update webui build output * fix: Add `untrack` inside chat processing info data logic to prevent infinite effect * fix: Regenerate * feat: Remove redundant settigns + rearrange * fix: Audio attachments * refactor: Icons * chore: update webui build output * feat: Model management and selection features WIP * chore: update webui build output * refactor: Improve server properties management * refactor: Icons * chore: update webui build output * feat: Improve model loading/unloading status updates * chore: update webui build output * refactor: Improve API header management via utility functions * remove support for extra args * set hf_repo/docker_repo as model alias when posible * refactor: Remove ConversationsService * refactor: Chat requests abort handling * refactor: Server store * tmp webui build * refactor: Model modality handling * chore: update webui build output * refactor: Processing state reactivity * fix: UI * refactor: Services/Stores syntax + logic improvements Refactors components to access stores directly instead of using exported getter functions. This change centralizes store access and logic, simplifying component code and improving maintainability by reducing the number of exported functions and promoting direct store interaction. Removes exported getter functions from `chat.svelte.ts`, `conversations.svelte.ts`, `models.svelte.ts` and `settings.svelte.ts`. * refactor: Architecture cleanup * feat: Improve statistic badges * feat: Condition available models based on modality + better model loading strategy & UX * docs: Architecture documentation * feat: Update logic for PDF as Image * add TODO for http client * refactor: Enhance model info and attachment handling * chore: update webui build output * refactor: Components naming * chore: update webui build output * refactor: Cleanup * refactor: DRY `getAttachmentDisplayItems` function + fix UI * chore: update webui build output * fix: Modality detection improvement for text-based PDF attachments * refactor: Cleanup * docs: Add info comment * refactor: Cleanup * re * refactor: Cleanup * refactor: Cleanup * feat: Attachment logic & UI improvements * refactor: Constants * feat: Improve UI sidebar background color * chore: update webui build output * refactor: Utils imports + move types to `app.d.ts` * test: Fix Storybook mocks * chore: update webui build output * test: Update Chat Form UI tests * refactor: Tooltip Provider from core layout * refactor: Tests to separate location * decouple server_models from server_routes * test: Move demo test to tests/server * refactor: Remove redundant method * chore: update webui build output * also route anthropic endpoints * fix duplicated arg * fix invalid ptr to shutdown_handler * server : minor * rm unused fn * add ?autoload=true\|false query param * refactor: Remove redundant code * docs: Update README documentations + architecture & data flow diagrams * fix: Disable autoload on calling server props for the model * chore: update webui build output * fix ubuntu build * fix: Model status reactivity * fix: Modality detection for MODEL mode * chore: update webui build output --------- Co-authored-by: Aleksander Grygier <aleksander.grygier@gmail.com> Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>	2025-12-01 19:41:04 +01:00

26 Commits