16 Commits

Author SHA1 Message Date
Georgi Gerganov
6b4e4bd582
common : fix env names to all have LLAMA_ARG_ prefix (#23778) 2026-05-27 14:52:47 +03:00
Georgi Gerganov
0d18aaa9d1
ci : do not allocate ccache for 3rd-party hosted runners (#23730)
* ci : do not allocate ccache for 3rd-party hosted runners

[no release]

* cont : add prints

[no ci]
[no release]
2026-05-26 20:15:01 +03:00
Georgi Gerganov
678d43d720
ci : move more CPU jobs to self-hosted runners (#23715) 2026-05-26 15:37:40 +03:00
Georgi Gerganov
28123a3937
ci : move most slim jobs to self-hosted runners (#23619)
* ci : remove tag from build-self-hosted.yml

* ci : slim -> self-hosted

* ci : prevent heavy CPU jobs from running on fast runners

* ci : prevent cmake pkg to run on dedicated fast runners

* ci : try to bump 3.11 -> 3.13

* ci : move lint back to 3.11

* ci : back to 3.11

* ci : add comment about UI jobs

* ci : move python requirements check to CPU runners

this job is a bit slow for a dedicated "fast" runner

* ci : add self-hosted ui workflow

* ci : fix UI naming

* tmp to check if arm64 fast is compatible with all jobs

* revert last commit
2026-05-25 08:11:19 +03:00
Georgi Gerganov
549b9d8433
ci : update build-self-hosted.yml (#23616) 2026-05-24 18:20:10 +03:00
Aleksander Grygier
59778f0196
ui: Restructure repo to use tools/ui folder and ui / UI / llama-ui / LLAMA_UI naming (#23064)
* webui: Move static build output from `tools/server/public` to `build/ui` directory

* refactor: Move to `tools/ui`

* refactor: rename CMake variables and preprocessor defines

- Rename LLAMA_BUILD_WEBUI -> LLAMA_BUILD_UI (old kept as deprecated)
- Rename LLAMA_USE_PREBUILT_WEBUI -> LLAMA_USE_PREBUILT_UI (old kept as deprecated)
- Backward compat: old vars auto-forward to new ones with DEPRECATION warning
- Rename internal vars: WEBUI_SOURCE -> UI_SOURCE, WEBUI_SOURCE_DIR -> UI_SOURCE_DIR, etc.
- Rename HF bucket: LLAMA_WEBUI_HF_BUCKET -> LLAMA_UI_HF_BUCKET
- Emit both LLAMA_BUILD_WEBUI and LLAMA_BUILD_UI preprocessor defines
- Emit both LLAMA_WEBUI_DEFAULT_ENABLED and LLAMA_UI_DEFAULT_ENABLED

* refactor: rename CLI flags (--webui -> --ui) with backward compat

- Add --ui/--no-ui (old --webui/--no-webui kept as deprecated aliases)
- Add --ui-config (old --webui-config kept as deprecated alias)
- Add --ui-config-file (old --webui-config-file kept as deprecated alias)
- Add --ui-mcp-proxy/--no-ui-mcp-proxy (old --webui-mcp-proxy kept as deprecated)
- Add new env vars: LLAMA_ARG_UI, LLAMA_ARG_UI_CONFIG, LLAMA_ARG_UI_CONFIG_FILE, LLAMA_ARG_UI_MCP_PROXY
- C++ struct fields: params.ui, params.ui_config_json, params.ui_mcp_proxy added alongside old fields
- Backward compat: old fields synced to new ones in g_params_to_internals

* refactor: update C++ server internals with backward compat

- Rename json_webui_settings -> json_ui_settings (both kept in server_context_meta)
- Rename params.webui usage -> params.ui (both synced, old still works)
- JSON API emits both "ui"/"ui_settings" and "webui"/"webui_settings" keys
- Server routes use params.ui_mcp_proxy || params.webui_mcp_proxy
- Preprocessor guards use #if defined(LLAMA_BUILD_UI) || defined(LLAMA_BUILD_WEBUI)

* refactor: rename CI/CD workflows, artifacts, and build script

- Rename webui-build.yml -> ui-build.yml; artifact webui-build -> ui-build
- Rename webui-publish.yml -> ui-publish.yml; var HF_BUCKET_WEBUI_STATIC_OUTPUT -> HF_BUCKET_UI_STATIC_OUTPUT
- Rename server-webui.yml -> server-ui.yml; job webui-build/checks -> ui-build/checks
- Update server.yml: job/artifact refs webui-build -> ui-build
- Update release.yml: all webui-build/publish refs -> ui-build/publish; HF_TOKEN_WEBUI_STATIC_OUTPUT -> HF_TOKEN_UI_STATIC_OUTPUT
- Update server-self-hosted.yml: webui-build -> ui-build
- Update build-self-hosted.yml: HF_WEBUI_VERSION -> HF_UI_VERSION
- Rename webui-download.cmake -> ui-download.cmake (internal refs updated)
- Update labeler.yml: server/webui -> server/ui path label

* docs: update CODEOWNERS and server README docs

- Update CODEOWNERS: team ggml-org/llama-webui -> ggml-org/llama-ui, path /tools/server/webui/ -> /tools/ui/
- Update server README.md: CLI tables show --ui flags with deprecated --webui aliases
- Update server README-dev.md: "WebUI" -> "UI", paths updated to tools/ui/

* fix: Small fixes for UI build

* fix: CMake.txt syntax

* chore: Formatting

* fix: `.editorconfig` for llama-ui

* chore: Formatting

* refactor: Use `APP_NAME` in Error route

* refactor: Cleanup

* refactor: Single migration service

* make llama-ui a linkable target

* fix: UI Build output

* fix: Missing change

* fix: separate llama-ui npm build output into build/tools/ui/dist subfolder + use cmake npm build instead of downloading ui-build.yml artifacts in CI

* refactor: UI workflows cleanup

---------

Co-authored-by: Xuan Son Nguyen <son@huggingface.co>
2026-05-16 02:02:40 +02:00
Reese Levine
834a243664
ggml-webgpu: Enable NVIDIA self-hosted CI (#22976)
* Enabel nvidia ci for webgpu

* Address precision issues

* fix placement

* Relax more set_rows and div

* Try relaxing all f16

* formatting and naming

* Add comment explaining max_nmse_err logic

Added comment referencing pull request for clarification.
2026-05-14 09:41:32 -07:00
Aleksander Grygier
0c3e4fccca
fix: Propagate version tag to WebUI asset download in self-hosted CI (#23051)
* fix: Propagate version tag to WebUI asset download in self-hosted CI

* refactor: Apply suggestions from @CISC

Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>

* fix: Skip npm build when Node.js is not installed

Avoid 'no such file or directory' errors on CI runners that lack
Node.js. Check if npm is available via find_program before attempting
npm install + npm run build. Falls back to HF Bucket download.

* fix: Use + separator for ASSETS list to fix Windows build

Replace fragile \; escaping with a + separator when passing the
WebUI asset list via -DASSETS to the download script. On Windows,
the \; escaping was not reliably preserved through the CMake build
system, causing all asset filenames to be concatenated into one
(e.g., 'index.html;bundle.js;bundle.css;loading.html' as a single
file), which broke the HF Bucket download and subsequent xxd.cmake
step.

+ is safe because it is not special in cmd.exe (unlike | which is a
pipe operator), not special in CMake's -D argument parser, and not
a valid Windows filename character. CMakeLists.txt joins assets
with + and webui-download.cmake splits them back via regex.

* fix: Validate HF_WEBUI_VERSION environment variable with regex

Add input validation for the HF_WEBUI_VERSION env var to prevent
CMake list separator or path-traversal issues in stamp filenames
and download URLs. Rejects non-conforming characters early.

* fix: Remove 'latest' fallback for HF_WEBUI_VERSION

When needs.determine-tag.outputs.tag_name is empty, let CMake's
default resolution handle it (empty -> git-based version lookup)
instead of falling back to 'latest'. This ensures the sentinel
stamp file is consistent with CMake's resolution logic.

* fix: Demote checksum verification failure to warning instead of hard gate

* fix: End line character

---------

Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>
2026-05-14 17:57:20 +02:00
Zijun Yu
52f1096f21
openvino: driver setup, CI split, thread safety, and NPU optimizations (#21944)
* Thread safety per request only

* Fix ROPE yarn case

* Fix sticky stateful config

* Use i4/i8 directly for symmetric quant

* Use weightless caching

* Add WeightlessCacheAttribute to reduce NPU memory usage

* Gelu tanh support (#125)

* Imrope support (#126)

* fix(openvino): explicit ov::Tensor frees in ggml_backend_openvino_free

* add GPU,NPU support in OV Dockerfile

* add build-openvino.yml ci

* Fix sticky stateful config

* add concurrency to ov-gpu ci runs. Move OV CI to build-openvino.yml

* fix thread-safety of shared runtime context

* rope type abstraction for frontend translations

* fix editorconfig

---------

Co-authored-by: Mustafa Cavus <mustafa.cavus@intel.com>
Co-authored-by: Dan Hoffman <dhoff749@gmail.com>
Co-authored-by: Ravi Panchumarthy <ravi.panchumarthy@intel.com>
2026-04-21 18:58:34 +03:00
Reese Levine
45cac7ca70
ggml-webgpu: fix compiler warnings and refactor FlashAttention encoding (#21052)
* Update workflows to remove dependence on llvmpipe

* Try setting Dawn_DIR

* remove c++20 initializers

* Move to proper guid

* Try avoiding segfaults on vulkan backend process exit

* Remove compiler warnings on parameter casting

* Fix soft_max and update reg_tile accumulation to f32 for better precision

* Refactor flash_attn a bit

* remove c++20 initializers and format

* Increase div precision for NVIDIA

* revert div precision and comment out ggml-ci node for now

* Formatting

* Try debugging on a failing CI node

* Revert "Try debugging on a failing CI node"

This reverts commit 1971e33cba919915e12bcfd5828abfbd54ca942e.
2026-04-17 09:17:11 -07:00
Georgi Gerganov
f4b5bf2f32
ci : re-enable mac workflows (#21894)
* ci : re-enable mac workflows

* vulkan : fix compile warning
2026-04-14 15:58:09 +03:00
Masato Nakasaka
e439700992
ci: Add Windows Vulkan backend testing on Intel (#21292)
* experimenting CI

* Experimenting CI fix for MinGW

* experimenting CI on Windows

* modified script for integration with VisualStudio

* added proxy handling

* adding python version for Windows execution

* fix iterator::end() dereference

* fixed proxy handling

* Fix errors occurring on Windows

* fixed ci script

* Reverted to master

* Stripping test items to simplify Windows test

* adjusting script for windows testing

* Changed shell

* Fixed shell

* Fixed shell

* Fix CI setting

* Fix CI setting

* Fix CI setting

* Experimenting ci fix

* Experimenting ci fix

* Experimenting ci fix

* Experimenting ci fix

* experimenting fix for unit test error

* Changed to use BUILD_LOW_PERF to skip python tests

* Fix CI

* Added option to specify Ninja generator

* Reverted proxy related changes
2026-04-03 20:16:44 +03:00
Georgi Gerganov
3fab96cd04
ci : disable self-hosted mac jobs (#20985) 2026-03-25 14:46:40 +02:00
Georgi Gerganov
45172df4d6
ci : disable AMX jobs (#20654)
[no ci]
2026-03-16 22:38:59 +02:00
Sigbjørn Skjæret
b91d7dfe5b
ci : only save openvino caches on github-hosted master (#20593)
* only save openvino ccache on master

* disable toolkit cache if self-hosted

* only cache on github-hosted runners

* remove toolkit cache [no ci]
2026-03-15 18:58:13 +01:00
Georgi Gerganov
b4768955c4
ci : move self-hosted workflows to separate files (#20540) 2026-03-14 23:15:35 +02:00