Commit Graph

  • 75ad0b23ed
    server: fix remote preset handling, add test (#24938) b9770 Xuan-Son Nguyen 2026-06-23 13:28:34 +02:00
  • f7421eabe8 wip Xuan Son Nguyen 2026-06-23 13:28:14 +02:00
  • 59797670dc cli: move to HTTP-based implementation Xuan Son Nguyen 2026-06-23 13:14:28 +02:00
  • c926ad0985
    vulkan: link ggml-cpu when GGML_VULKAN_CHECK_RESULTS / RUN_TESTS are enabled (#24444) b9769 Wyatt Caldwell 2026-06-23 03:55:46 -07:00
  • a3900a6694
    model: Granite Speech Plus (#24818) b9768 Gabe Goodhart 2026-06-23 04:03:31 -06:00
  • 7c908502ea
    ggml-webgpu: improve MTP inference by using mat-vec path for small batches (#24811) b9767 Masashi Yoshimura 2026-06-23 17:13:55 +09:00
  • 035cd8f9a6
    codeowners: add yomaytk to ggml-webgpu (#24930) Masashi Yoshimura 2026-06-23 15:19:34 +09:00
  • 73618f27a8
    server: improve user message detection and create checkpoints at every user message (#24176) b9765 Aldehir Rojas 2026-06-23 00:27:28 -05:00
  • 23ee8797e1
    opencl: q8_0 gemv precision improvement (#24923) Shawn Gu 2026-06-22 22:25:21 -07:00
  • a19f3ea631 misc: update lables Xuan Son Nguyen 2026-06-23 00:22:12 +02:00
  • dec5ca5577
    server : Add id to tool call responses api (#24882) b9763 Matt Thompson 2026-06-22 14:03:12 -07:00
  • 9c0ac887f3
    ui: Prioritize favorite models in model selection (#24766) Mahdiou Diallo 2026-06-22 21:00:21 +02:00
  • 095058ca19 add arg --threads-sampling xsn/server_multithread_sampling Xuan Son Nguyen 2026-06-22 20:03:49 +02:00
  • c62fdd5fd0 working Xuan Son Nguyen 2026-06-22 19:38:25 +02:00
  • 41ed530be2 wip Xuan Son Nguyen 2026-06-22 19:30:11 +02:00
  • fe03cce8db server: run sampling in a threadpool Xuan Son Nguyen 2026-06-22 19:05:39 +02:00
  • 721354fbdf
    server: (router) move model downloading to dedicated process (#24834) b9761 Xuan-Son Nguyen 2026-06-22 18:24:04 +02:00
  • 6ee0f65793
    server: refactor/generalize input file schema (#24299) b9760 Xuan-Son Nguyen 2026-06-22 16:42:47 +02:00
  • 1b82e9ae51 fix windows xsn/server_input_file_schema Xuan Son Nguyen 2026-06-22 16:20:56 +02:00
  • 61653c7989 Merge branch 'master' into xsn/server_input_file_schema Xuan Son Nguyen 2026-06-22 16:19:59 +02:00
  • 099b579acb
    ui: model status and load progress via /models/sse feed (#24878) Pascal 2026-06-22 15:55:30 +02:00
  • 037397792a vulkan: split ggml-vulkan.cpp file 0cc4m/vulkan-cpp-split Ruben Ortlam 2026-06-22 15:50:01 +02:00
  • f8cc15f163
    [SYCL] support bf16 on bin_bcast OP and unary OPs (#24838) b9758 Neo Zhang 2026-06-22 19:09:02 +08:00
  • 37957e8531
    sampling : remove unconditional softmax+sort in top-n-sigma sampler (#22645) b9757 Tim Neumann 2026-06-22 13:08:32 +02:00
  • d0f9d2e5ac
    server: fix edit_file crash on append at end of file (line_start -1) (#24893) b9756 Pascal 2026-06-22 10:55:28 +02:00
  • 0ef6f06d55
    docs/android.md: Add dependency libandroid-spawn for building in termux (#21812) b9755 aafsmarak 2026-06-22 09:18:31 +05:30
  • 52b3df0023
    common/peg : implement ac parser for stricter grammar generation (#24869) b9754 Aldehir Rojas 2026-06-21 16:20:58 -05:00
  • 7c082bc417
    server: fix report progress for loading spec models, add "stages" list (#24870) b9753 Xuan-Son Nguyen 2026-06-21 17:36:52 +02:00
  • bddfd2b113
    server: refactor batch construction (#24843) b9752 Xuan-Son Nguyen 2026-06-21 14:16:11 +02:00
  • 0d135df48c
    mtmd: fix mtmd_get_memory_usage (#24867) b9751 Xuan-Son Nguyen 2026-06-21 14:12:15 +02:00
  • bf533823cd
    jinja : implement call statement (#24847) b9750 Sigbjørn Skjæret 2026-06-21 14:04:52 +02:00
  • 2f89acc2bc
    mtmd: add load progress callback (#24865) Xuan-Son Nguyen 2026-06-21 13:40:52 +02:00
  • 7ac864bf97 disable DEBUG_TIMINGS xsn/server_refactor_batch Xuan Son Nguyen 2026-06-21 13:38:09 +02:00
  • d37414510b address comments Xuan Son Nguyen 2026-06-21 13:15:58 +02:00
  • bfa3219177
    server: add "verbose" field to schema (#24864) b9748 Xuan-Son Nguyen 2026-06-21 13:03:14 +02:00
  • d6d899580d
    server: real-time model load progress tracking via /models/sse (#24828) b9747 Xuan-Son Nguyen 2026-06-21 11:58:14 +02:00
  • f1ef61fb1b server: add "verbose" field to schema xsn/server_verbose_field Xuan Son Nguyen 2026-06-21 11:16:06 +02:00
  • 8a118ee86c
    minor : clean-up whitespaces (#24862) Georgi Gerganov 2026-06-21 11:37:12 +03:00
  • d789527482
    spec : Support Step3.5/3.7 flash mtp3 (#24340) b9745 YiChen Lv 2026-06-21 16:33:18 +08:00
  • 063d9c156e
    common/peg : refactor until gbnf grammar generation (#24839) b9744 Aldehir Rojas 2026-06-20 21:15:06 -05:00
  • c57607016a
    common/json-schema-to-grammar : align spacing rules with parsers (#24835) b9743 Aldehir Rojas 2026-06-20 17:43:04 -05:00
  • 4a80943174
    fix(hexagon): use padded stride for ssm-conv weights (#24470) b9742 Guanhuai Zhang 2026-06-21 05:58:49 +08:00
  • 447b0c3646 poc: threadpool sampling xsn/tmp_smpl_parallel Xuan Son Nguyen 2026-06-20 22:08:42 +02:00
  • a527509d0f debug: force llama_synchronize for accurate timings Xuan Son Nguyen 2026-06-20 20:22:31 +02:00
  • 7486a39756 (debug) add timings Xuan Son Nguyen 2026-06-20 20:12:05 +02:00
  • 84de01a1f1
    llama : use LLM_KV for quantization_version & file_type (#24802) b9741 Adrien Gallouët 2026-06-20 20:07:01 +02:00
  • ea65a4b1c8 small nits Xuan Son Nguyen 2026-06-20 19:54:31 +02:00
  • b28e3682e5 Merge branch 'master' into xsn/server_refactor_batch Xuan Son Nguyen 2026-06-20 19:48:36 +02:00
  • 53763db789 rm debug log Xuan Son Nguyen 2026-06-20 19:48:14 +02:00
  • 75f460ac28
    arg: try fixing test-args-parser randomly fails (#24826) b9740 Xuan-Son Nguyen 2026-06-20 19:45:27 +02:00
  • bf36838ebd fix assert Xuan Son Nguyen 2026-06-20 19:32:47 +02:00
  • 64ec03d10b handle batch full more carefully Xuan Son Nguyen 2026-06-20 19:30:59 +02:00
  • d704c7929b add abort_all_slots Xuan Son Nguyen 2026-06-20 19:20:12 +02:00
  • af583e3ed3 wip 4 Xuan Son Nguyen 2026-06-20 19:18:05 +02:00
  • b786bb2e60 wip 3 Xuan Son Nguyen 2026-06-20 18:56:58 +02:00
  • 2b2eed8fd7 wip 2 Xuan Son Nguyen 2026-06-20 18:41:56 +02:00
  • 8452824611
    release: add missing link for win opencl adreno arm64 (#24809) b9739 Muhammad Salem 2026-06-20 18:08:59 +03:00
  • 6c5c5a29d6 wip Xuan Son Nguyen 2026-06-20 16:48:12 +02:00
  • d5037c508a server: refactor batch construction Xuan Son Nguyen 2026-06-20 16:35:57 +02:00
  • e27f308597
    server: avoid forwarding auth headers in CORS proxy (#24373) b9738 Matti4 2026-06-20 15:34:47 +02:00
  • 67e9fd3b74
    docker : prebuild web UI for s390x build [no release] (#24829) b9737 Aldehir Rojas 2026-06-20 05:54:42 -05:00
  • 796f41bedc
    model : glm-dsa load DSA indexer tensors as optional (#24770) b9736 davidrhodus 2026-06-20 03:48:24 -07:00
  • 37a77fb057
    ggml : optimize AMX (#24806) b9735 Adrien Gallouët 2026-06-20 12:43:06 +02:00
  • f4043fec01
    convert : more consistent handling of rope_parameters (#24833) Sigbjørn Skjæret 2026-06-20 12:42:36 +02:00
  • f449e05537
    ggml-webgpu: add adapter toggles for F16 on Vulkan + NVIDIA b9733 Masashi Yoshimura 2026-06-20 08:12:32 +09:00
  • 2b686a9120
    server: refactor child --> router communication (#24821) b9732 Xuan-Son Nguyen 2026-06-20 01:02:26 +02:00
  • 4b48a53b6c
    server : optimize get_token_probabilities (#24796) b9731 Adrien Gallouët 2026-06-19 23:26:54 +02:00
  • e475fa2b5f
    mtmd, arg: fix utf8 handling on windows (#24779) b9730 Xuan-Son Nguyen 2026-06-19 22:28:38 +02:00
  • 175147e8f6
    server: remove all internal mentions about "webui" (#24817) b9729 Xuan-Son Nguyen 2026-06-19 22:12:46 +02:00
  • fabde3bf51
    arg: Add comment line support to --api-key-file (#23168) b9728 Mikolaj Kucharski 2026-06-19 15:33:54 +00:00
  • 0d2d9ccbf6
    vendor : update cpp-httplib to 0.48.0 (#24787) b9727 Alessandro de Oliveira Faria (A.K.A.CABELO) 2026-06-19 11:16:35 -03:00
  • 8c2d6f6475
    server: add --agent arg, remove redundant webui naming compat (#24801) b9726 Xuan-Son Nguyen 2026-06-19 16:06:13 +02:00
  • 38724ab593
    docker : build the UI (#24794) b9725 Aldehir Rojas 2026-06-19 08:32:31 -05:00
  • e2e7a9b2d0
    mtmd: several bug fixes (#24784) b9724 Xuan-Son Nguyen 2026-06-19 12:18:36 +02:00
  • b14e3fb90c
    spec: support eagle3 for qwen3.5 & 3.6 (#24593) b9723 Ruixiang Wang 2026-06-19 12:08:50 +02:00
  • 5a7462237e remove duplicated init calls 0cc4m/server-memory-limit Ruben Ortlam 2026-05-24 10:15:09 +02:00
  • 79210e3046 cleanup unused variable Ruben Ortlam 2026-05-21 11:03:16 +02:00
  • 84c4214b39 precompute name->buft map, map GPU host types to CPU buft Ruben Ortlam 2026-05-18 14:22:21 +02:00
  • dbc5f7ec82 move model memory estimation to subprocess Ruben Ortlam 2026-05-13 17:50:11 +02:00
  • 384a495a00 extract duplicated check into helper function Ruben Ortlam 2026-05-13 15:29:24 +02:00
  • 997491a644 replace device memory map with buft memory map. Use llama_get_memory_breakdown Ruben Ortlam 2026-05-13 15:13:13 +02:00
  • a35afd504f cont : clean-up Georgi Gerganov 2026-04-16 14:32:47 +03:00
  • 3046b8853a also strip models memory margin from child processes Ruben Ortlam 2026-04-13 10:14:53 +02:00
  • 216aaf1ad6 improve variable naming, fix style Ruben Ortlam 2026-04-07 13:35:02 +02:00
  • ff41b3dbf7 improve memory_per_device map naming Ruben Ortlam 2026-04-07 13:28:49 +02:00
  • 0e2f08a535 fix model count exceeded check Ruben Ortlam 2026-04-02 11:39:36 +02:00
  • 669948ce12 move llama_context_device_memory function to llama-ext.h Ruben Ortlam 2026-04-02 11:39:07 +02:00
  • 09d8eb95a4 add server memory debug logging Ruben Ortlam 2026-04-02 10:07:04 +02:00
  • c749b6882c use memory margin instead of total size limit, apply to each device separately Ruben Ortlam 2026-04-02 09:24:53 +02:00
  • 4ed48154b0 only set model memory_mb if not previously calculated Ruben Ortlam 2026-03-31 17:37:16 +02:00
  • 6178b8755d use no_alloc to get memory requirements for model load Ruben Ortlam 2026-03-31 16:18:03 +02:00
  • 340c867179 estimate with to-be-loaded model size included Ruben Ortlam 2026-03-29 12:18:51 +02:00
  • f38c4f9419 server: add --models-memory-max parameter to allow dynamically unloading models when they exceed a memory size threshold Ruben Ortlam 2026-03-29 10:00:49 +02:00
  • 159d093a43
    server: fix non-bound n_discard value (ctx shifting) (#24786) b9722 Xuan-Son Nguyen 2026-06-19 10:53:44 +02:00
  • 5fd2dc2c41 sync : ggml b9721 Georgi Gerganov 2026-06-19 10:18:14 +03:00
  • 1868af13ac ggml : bump version to 0.15.2 (ggml/1548) Georgi Gerganov 2026-06-19 10:14:26 +03:00
  • 5bd21b8555
    pi : remove docs from system prompt (#24791) Georgi Gerganov 2026-06-19 09:34:00 +03:00
  • 80452d65b9
    server : consolidate slot selection into get_available_slot (#24755) b9718 Georgi Gerganov 2026-06-19 09:22:34 +03:00
  • 8141e730f1
    ggml-cpu: support K tails in power10 Q8/Q4 MMA matmul (#24753) b9717 shalinib-ibm 2026-06-19 11:25:38 +05:30
  • db52540f73
    mtmd: add batching support for internvl (#24775) b9716 Xuan-Son Nguyen 2026-06-19 01:16:16 +02:00