ik_llama.cpp

mirror of https://github.com/ikawrakow/ik_llama.cpp.git synced 2026-06-28 04:30:15 -05:00

Author	SHA1	Message	Date
Kawrakow	997c587a6c	Fix #1837 (#1838 )	2026-05-19 17:56:21 +03:00
Kawrakow	27d7a74389	Compiler warnings	2026-05-19 05:51:27 +00:00
firecoperana	9ad8b8c6db	common/grammar: fix grammar parsing issues to prevent stack overflow and hangs (#1822 ) * grammar: Fix grammar root symbol check (#19761) * grammar: fix bad check for root symbol, correct error logging * add tests to demonstrate root symbol check failure # Conflicts: # tests/test-grammar-integration.cpp * common/grammar: fix grammar parsing issues to prevent stack overflow and hangs (#18604) * grammar: add test case for nullable symbol loop Reproduce stack overflow (or OOM) with ( [x]* )* found while adding GBNF support to ripgrep-edit. llama-server reproducer: curl \ -X POST \ -d '{ "messages": [{ "role": "user", "content": "write yes" }], "grammar": "root ::= ( [x]* )" }' \ -H "Content-Type: application/json" \ http://localhost:8811/v1/chat/completions grammar: prevent stack overflow with nullable symbol loop Fix a potential stack overflow in llama_grammar_advance_stack that could occur when processing grammars with nullable symbols that lead to infinite derivations of empty strings. The fix introduces cycle detection by tracking visited stacks to prevent infinite recursion. rg-edit regexp: llama_grammar_advance_stack rg-edit extra-args: -A20 rg-edit directive: """Rewrite: fix the following segfault: [..] ⚫ Testing segfault. Grammar: root ::= ( [x]* )* root ::= ( [x]* )* Segmentation fault build/bin/test-grammar-integration""" gptel-context: (("~/llama.cpp/src/llama-grammar.cpp") ("~/llama.cpp/tests/test-grammar-integration.cpp") ("~/llama.cpp/grammars/./list.gbnf") ("~/llama.cpp/grammars/./json_arr.gbnf") ("~/llama.cpp/grammars/./json.gbnf") ("~/llama.cpp/grammars/./japanese.gbnf") ("~/llama.cpp/grammars/./english.gbnf") ("~/llama.cpp/grammars/./chess.gbnf") ("~/llama.cpp/grammars/./c.gbnf") ("~/llama.cpp/grammars/./arithmetic.gbnf") ("~/llama.cpp/grammars/./README.md")) * grammar: convert recursive llama_grammar_advance_stack to iterative This change converts the function to an iterative approach using explicit stacks, which prevents deep recursion and eliminates the risk of stack overflow. rg-edit regexp: llama_grammar_advance_stack rg-edit extra-args: -A30 rg-edit directive: """Rewrite: fix the following segfault: [..] ⚫ Testing segfault. Grammar: root ::= ( [x]* )* root ::= ( [x]* )* Segmentation fault build/bin/test-grammar-integration convert from recursive to interactive""" gptel-context: (("~/llama.cpp/src/llama-grammar.cpp") ("~/llama.cpp/tests/test-grammar-integration.cpp") ("~/llama.cpp/grammars/./list.gbnf") ("~/llama.cpp/grammars/./json_arr.gbnf") ("~/llama.cpp/grammars/./json.gbnf") ("~/llama.cpp/grammars/./japanese.gbnf") ("~/llama.cpp/grammars/./english.gbnf") ("~/llama.cpp/grammars/./chess.gbnf") ("~/llama.cpp/grammars/./c.gbnf") ("~/llama.cpp/grammars/./arithmetic.gbnf") ("~/llama.cpp/grammars/./README.md")) v2: Added a `std::set` to perform tree-based lookups with O(N log N) complexity. Testing with a parallel run of `test-grammar-integration` shows a double-digit percentage increase in runtime. An `unordered_set` with O(1) hashing was also evaluated, but the overhead of constructing hash keys from pointers made it significantly slower than the rbtree implementation that only requires an ordering operator. The performance regression in the test suite appears justified by the overall reduction in algorithmic complexity. Co-developed-by: Piotr Wilkin (ilintar) <piotr.wilkin@syndatis.com> * grammar: add test case for hang in repetition grammar processing This commit adds a new test case to the grammar integration tests that specifically targets a hang scenario in the repetition grammar parser found while adding GBNF support to ripgrep-edit. llama-server reproducer: curl \ -X POST \ -d '{ "messages": [{ "role": "user", "content": "write yes" }], "grammar": "root ::= (([^x]){0,99}){0,99}" }' \ -H "Content-Type: application/json" \ http://localhost:8811/v1/chat/completions grammar: add repetition threshold check The change introduces a maximum repetition threshold to avoid excessive rule expansion during grammar parsing. When parsing repetition patterns like {m,n}, the parser now calculates the potential number of rules that would be generated and throws an error if the product of previous rules and new rules exceeds the threshold. A test case was added to verify the threshold is properly enforced for deeply nested repetition patterns that would otherwise cause hangs. --------- Co-authored-by: Asbjørn Olling <asbjornolling@gmail.com> Co-authored-by: Andrea Arcangeli <aarcange@redhat.com>	2026-05-19 08:36:49 +03:00
David Young	c07a052315	MLA tensor parallelism under -sm graph (DEEPSEEK2/GLM_DSA/MISTRAL4) (#1821 ) * MLA tensor parallelism under -sm graph (DEEPSEEK2/GLM_DSA/MISTRAL4) Extends -sm graph (split-mode graph) to MLA-style attention across the DEEPSEEK2, GLM_DSA, and MISTRAL4 architectures. Previously these archs fell back to -sm layer regardless of the user's flag. Implementation: - Per-rank attention build in build_deepseek2_tp_attention with view-sliced FlashAttention, split-buffer output projection, and ggml_reduce across devices - wk_b / wv_b absorbed weights replicated per device via materialize() in llm_prepare_mla (these can't live in a split buffer) - KV cache replication path (replicated_k_l) for graph-mode TP - distribute_mla_tensors_for_split_mode_graph routes attention/norm tensors into ctx_split; expert tensors stay per-layer - Implements ggml_backend_cuda_split_buffer_get_tensor for the replicated / row-split / col-split inverse paths - Early-reject guard in src/llama.cpp that auto-downgrades -sm graph to -sm layer (with a warning) when incompatible loader flags are set: -ncmoe, -cmoe, -ot, -rtr, -muge New CLI flag: - -gap \| --graph-attn-precision <f16\|f32> (default f16) See the PR description for the full validation matrix (3 archs x 2/4/8 GPU counts), perf numbers, VRAM accounting, and known limitations. * Some tweaks * materialize lambda: per-head split for graph-mode tp_replicate 7dd19e19 changed wk_b/wv_b distribution from mirror to per-head split (split_dim=2) via prepare_split_tensors. That path only fires when wk_b/wv_b are loaded from GGUF. Models that store only wkv_b in GGUF derive wk_b/wv_b at load via llm_prepare_mla, going through the materialize lambda, which was untouched and still produced mirror replicas (split_dim=-1, full n_head per device). build_deepseek2_tp_attention now does mul_mat(wk_b_local, q_nope_perm) without the prior view_3d slice, so a mirror replica passes an n_head tensor where the kernel expects n_head_local. Result: silent SIGSEGV right after model load. Mirror logic in materialize is replaced with the same per-head split as prepare_split_tensors: head_offsets derived from wo split, each rank gets a tensor with ne[2]=n_head_local, data copied from the appropriate source byte slice. Singular `computed` tensor keeps full metadata for tensors_by_name lookups. Tested: 8x3090, -sm graph -mla 3 -fa on now boots cleanly and sweep-benches without crash. Log confirms new path: "Computed blk.X.attn_k_b.weight ... split across N devices on dim=2". * cleanup: indent fix + remove dead view_3d slicing and debug printf - build_deepseek2.cpp: re-indent the self_attention block in build_deepseek2_layer_attention (lines 253-670). Block was at column 0 inside a function body; now at the expected 4/8-space indent. - build_deepseek2.cpp: drop the commented-out view_3d slicing and debug printfs left over after 7dd19e19's switch to direct mul_mat on per-rank wk_b_local / wv_b_local. Update the stale 'wk_b is replicated (split_dim=-1)' comment to match the new split_dim=2 reality. - ggml-cuda.cu: remove the leftover debug printf in ggml_backend_cuda_split_buffer_get_tensor. No behavior change. Verified with a clean rebuild and DSV2.5 + GLM-4.7-Flash sweep-bench runs. * llm_load_tensors: gate incompatible-flag warning to MLA archs The -ncmoe / -rtr / -muge / -ot warning under -sm graph currently fires for all archs that support graph mode. That's an over-reach: the incompatibility is specific to the MLA TP paths (DEEPSEEK2, GLM_DSA, MISTRAL4) — Gemma4 graph mode existed pre-PR and works with those flags. Gate the warning to MLA archs only. Also refreshes two stale comments left over from the wk_b/wv_b mirror -> per-head-split rewrite: - src/llama.cpp llm_prepare_mla: "Replicate wk_b/wv_b ..." now reads "Per-head split wk_b/wv_b ..." to match what the materialize lambda actually does post-823a39e2. - src/llama-load-tensors.cpp distribute_mla_tensors_for_split_mode_graph: drop the wkv_b row-split mention (wkv_b is no longer created under graph mode after 7dd19e19) and correct the wk_b/wv_b distribution description (per-head split, not per-device replicated). --------- Co-authored-by: Kawrakow <iwankawrakow@gmail.com>	2026-05-19 08:36:17 +03:00
firecoperana	104846ddee	spec : disacard last drafted token with low prob (#1820 ) * spec : disacard last drafted token with low prob * Apply suggestion from @ikawrakow Co-authored-by: Kawrakow <iwankawrakow@gmail.com> --------- Co-authored-by: firecoperana <firecoperana> Co-authored-by: Kawrakow <iwankawrakow@gmail.com>	2026-05-19 08:35:35 +03:00
Joel Farthing	f43a9f1cf6	Add per-byte CUDA MoE offload threshold (#1813 ) Co-authored-by: Joel Farthing <262452229+joelfarthing@users.noreply.github.com>	2026-05-19 08:35:05 +03:00
firecoperana	f645ed1e2d	AutoParser: improve reasoning budget and handling of space/newline in tool calls (#1819 ) common/chat, server: refactor, move all conversion functions to common, add tests (#20690) jinja : remove unused header (#22310) common : fix jinja warnings with clang 21 (#22313) Signed-off-by: Adrien Gallouët <angt@huggingface.co> chat: fix handling of space in reasoning markers (#22353) * chat: fix handling of space in reasoning markers common : re-arm reasoning budget after DONE on new <think> (#22323) common : determine generation prompt using longest common prefix (#22657) common/autoparser: fixes for newline handling / forced tool calls (#22654) * chat/autoparser: the fixes * Move optspace() to chat-peg-parser, comment out server tests invalidated due to content now allowed with forced tool calls. * Trim whitespace on apply instead common/chat : preserve media markers for typed-content templates (#22634) common : revert reasoning budget +inf logit bias (#22740) common : do not wrap raw strings in schema parser for tagged parsers (#22827) common : enable streaming JSON argument values (#23173) * common : remove atomic from json arguments * common : remove parsing logic on JSON arguments common : do not pass prompt tokens to reasoning budget sampler (#22488) reasoning-budget: clone should do a deep-copy (#23095) Co-authored-by: Piotr Wilkin (ilintar) <piotr.wilkin@syndatis.com>	2026-05-19 08:34:19 +03:00
Kawrakow	40aae0b6d8	Check for output_extra.weight when loading Gemma4 assistant models (#1817 )	2026-05-18 08:17:05 +03:00
Kawrakow	a407b9ca3d	Fix Qwen3.6-MoE low MTP acceptance rate (#1815 ) * Fix Qwen3.6-MoE low MTP acceptance rate * Fix Gemma4 MTP	2026-05-18 07:26:17 +03:00
gapeleon	c35189d83c	fix(server): reset chat parser on slot reuse to prevent crash (#1763 ) (#1794 ) If a slot is reused for a standard completion (`/v1/completions`) after being used for a chat completion (`/v1/chat/completions`), the previous chat's PEG parser would remain active in the slot's parameters. This caused standard text completions to throw on the raw text.	2026-05-17 18:26:45 +03:00
Kawrakow	0ab9bdf793	Fix Qwen3.5/3.6 MTP and -muge (#1816 )	2026-05-17 17:14:47 +03:00
Kawrakow	1f8c603d9c	Quantize: add extra output tensor for MTP (#1810 ) * Quantize: add extra output tensor for MTP * Consistently use --mtp-requantize-output-tensor	2026-05-17 13:59:56 +03:00
Kawrakow	3e573cfea6	MTP: option to use re-quantized output tensor for better TG performance (#1809 ) * Option to use re-quantized output tensor for MTP * Remove quantize extra output option * Handle interleaved types	2026-05-16 14:40:18 +03:00
Kawrakow	5cc0d86c76	imatrix: use data for ffn_up when data for ffn_gate is missing (#1806 )	2026-05-15 14:38:16 +03:00
Samuel Oliveira Alves	f4f4b3ff26	Allow dual speculative decoding (#1789 ) * wip: test logic to use multiple specs * feat: introduce composite speculative decoding stages * handle MTP context and draft invalidation * fix: allow gemma mtp for speculative stages * fix: normalize spec stage keys * refactor: remove enable_mtp flag and improve speculative stage handling * fix: update cached text tokens handling for stage chains * feat: implement sync for external MTP after non-MTP accept	2026-05-15 10:10:40 +03:00
Jun Yamog	53cd4d0ff0	fix: use mmq for volta quantized matmuls (#1785 )	2026-05-15 08:11:49 +03:00
Samuel Oliveira Alves	40b65d8f54	feat: add support for draft imatrix output file (#1803 )	2026-05-15 08:10:58 +03:00
Kawrakow	4e1851b01a	imatrix: use data for ffn_up when data for ffn_gate is missing (#1805 )	2026-05-15 07:28:34 +03:00
Kawrakow	ba72890076	Faster imatrix (#1801 ) * Faster imatrix on AVX2 * Slightly better	2026-05-15 07:15:16 +03:00
Samuel Oliveira Alves	35fbe08d6e	disable MTP for parallel slots (#1804 )	2026-05-15 07:11:04 +03:00
Samuel Oliveira Alves	0fcffdb64d	feat: map Gemma 4 tensor and support with imatrix (#1796 )	2026-05-14 09:01:24 +03:00
Marian M.	b2e7f7f6cd	Update docs (#1800 ) * Update README.md - New model - New features * Update parameters.md - Recent new parameters	2026-05-14 08:44:58 +03:00
Kawrakow	949bb8f1d6	More MTP tweaks (#1792 )	2026-05-13 17:55:43 +03:00
ubergarm	ca52a825db	feat: add --threads-mtmd for independent multimodal thread count (#1797 ) Add `-tm` / `--threads-mtmd` to control CPU thread count used during multimodal image/audio processing (mmproj encoding), separate from the main LLM thread count. This allows running the LLM on GPU with minimal CPU threads (e.g. `-t 1`) to reduce sync overhead, while using many threads (e.g. `-tm 16`) for CPU-bound mmproj encoding with `--no-mmproj-offload`. Fallback chain when `-tm` is not specified: 1. `--threads-batch` (-tb) — multimodal encoding is a batch/prefill-like operation, so it makes sense to track with batch thread count 2. `--threads` (-t) — final default Works with both mtmd-cli and llama-server. AI: ubergarm/Qwen3.6-27B-GGUF MTP IQ4_KS 15.113 GiB (4.752 BPW) + pi.dev	2026-05-13 17:44:43 +03:00
Forkoz	8a0f912cb2	Remove outdated asserts from mmproj (#1795 )	2026-05-13 17:40:11 +03:00
Kawrakow	6b221f0c1f	Fix ggml_nbytes (#1798 )	2026-05-13 17:39:25 +03:00
Kawrakow	397150caa2	MTP: faster recurrent state restore (#1791 ) * MTP: store ready per step convolution states * Cleanup	2026-05-13 11:00:24 +03:00
Kawrakow	86b5d076c5	Gemma4 MTP: avoid casting KV cache to f32 (#1786 )	2026-05-13 09:11:27 +03:00
ubergarm	f478a3ec0b	fix: only inflate n_batch for GPU-offloaded mmproj, not CPU (#1788 ) The get_batch_ubatch() function unconditionally inflated n_batch and n_ubatch whenever --mmproj was specified, regardless of whether the mmproj model actually ran on the GPU. This boosted batch size applies to both the main context and the MTP draft context, since params_base.speculative.cparams_dft is derived from common_context_params_to_llama(params_base). When mmproj runs on CPU (--no-mmproj-offload), this batch inflation is unnecessary for mmproj itself (CPU compute is sized by image dimensions independently), but it still inflates the MTP compute buffer proportionally. For large images (e.g. --image-max-tokens 4096), the MTP compute buffer ballooned to ~2020 MiB and triggered an OOM even though the mmproj model was fully on CPU and should have saved VRAM. Restrict the batch inflation to !params.mmproj.path.empty() && params.mmproj_use_gpu so it only triggers when mmproj actually occupies GPU memory. When mmproj runs on CPU, the existing per-chunk decode splitting in mtmd_helper_decode_image_chunk_impl handles large images correctly with the default batch size. AI: ubergarm/Qwen3.6-27B-GGUF MTP IQ4_KS 15.113 GiB (4.752 BPW) + pi.dev	2026-05-13 09:08:42 +03:00
firecoperana	cdc288bc97	server: reset cache tokens after pp stops (#1787 ) Co-authored-by: firecoperana <firecoperana>	2026-05-13 09:05:32 +03:00
Kawrakow	f9a93c37e2	Fix GLM-4.5 MTP loading (#1784 )	2026-05-12 18:06:17 +03:00
Jun Yamog	8b0cd0357a	fix: keep sm70 cublas f32 outputs in f32 (#1776 )	2026-05-12 07:38:42 +03:00
Kawrakow	cec1a6c1f5	MTP: Reuse graphs (again) (#1780 )	2026-05-12 07:36:12 +03:00
Samuel Oliveira Alves	be8435793e	Pre-allocate buffers for hybrid model checkpoints (#1774 ) * hybrid-spec: improve recurrent checkpoint handling in speculative decoding * change per-step save to support scheduling and asynchronous tensor operations * remove redudant backend tensor fallback * improve recurrent tensor handling for split graph	2026-05-12 07:21:25 +03:00
Lingfeng Ren	c2f498ab4c	MTP: use target slot position for drafting (#1781 )	2026-05-12 07:21:03 +03:00
Kawrakow	eb570eb966	MTP: Avoid per step SSM copy (#1778 ) * Avoid copying the per-step SSM state (CUDA) * Avoid copying the per-step SSM state (CPU) * Allocate only what is necessary for per-step SSM state * Cleanup	2026-05-11 18:15:55 +03:00
Kawrakow	3557b446f8	Avoid recurrent state copy (#1777 )	2026-05-11 13:13:59 +03:00
Kawrakow	94940cd882	MTP: ebable per step recurrent state for split mode graph (#1773 )	2026-05-11 12:40:04 +03:00
Lingfeng Ren	35845dd975	server : support MTP with multimodal prompts (#1758 ) Synchronize MTP state after mtmd decode batches so multimodal prompt chunks do not desync the draft context.	2026-05-11 09:51:07 +03:00
Kawrakow	23127139cb	Fix Mistral3 split mode graph (#1771 )	2026-05-10 17:05:13 +03:00
Kawrakow	4bbdb8ed0b	Faster per step recurrent state restore when using MTP (#1767 )	2026-05-10 07:51:06 +03:00
Samuel Oliveira Alves	c2b8bca807	Add MTP Support for Gemma 4 (#1744 ) * gemma-mtp: build the arch to load the MTP model * gemma-mtp: fix mtp kv state * gemma-mtp: refactor some functions and create gguf * gemma-mtp: make usable for embeddings models variant * gemma-mtp: fix qwen mtp load in graph split * gemma-mtp: refactor tensor creation and adjust output tensor handling * Gemma 4 MTP: improve tensor handling, and adjust split mode logic	2026-05-10 07:44:20 +03:00
XZiar	ab0f22b819	Use AVX version VNNI intrinsic when AVX512VNNI not available. (#1748 ) * Use AVX version VNNI intrinsic when AVX512VNNI not available. * remove changes under HAVE_FANCY_SIMD --------- Co-authored-by: XZiar <xziar@xziar.xziar>	2026-05-09 09:02:06 +03:00
Alex	51331f4973	Fix two speculative-decoding crashes that prevent any usage (#1760 ) This patch addresses two latent bugs in examples/speculative/speculative.cpp that prevent llama-speculative.exe from running on greedy sampling (temp=0) or producing rejection-sampling output (temp>0): 1. Line 191: `params.sparams.grammar = { COMMON_GRAMMAR_TYPE_NONE, "" };` invokes `common_grammar(type, grammar)` which asserts `type != NONE \|\| !grammar.empty()`. Both conditions fail with the intended-to-be-empty grammar, so every speculative run hits a hard `GGML_ASSERT` in common/sampling.h:63 immediately after model load. Fix: default-construct via `common_grammar{}` to bypass the field-init constructor. 2. Lines 293-294: `GGML_ASSERT(dist_tgt.sorted)` and `GGML_ASSERT(dist_dft.sorted)` fire whenever the draft sampler does not set the .sorted flag (which is most modern sampler paths). Comment them out — the next ~10 lines re-sort both distributions by id explicitly, so the assertion is incorrect anyway. Fix: replace the asserts with an explanatory comment. After both fixes, `llama-speculative.exe` runs to completion. The acceptance-rate measurement at temp=0 still looks suspicious (0% across same-family draft/target pairs), but that is a different issue out of scope for this PR. Tested on Qwen3-0.6B-IQ4_XS drafting Qwen3-1.7B-IQ4_XS, both base models from `bartowski/Qwen_Qwen3-*-GGUF` on Windows + ik_llama.cpp build at HEAD of windows-mingw-default-win10 (which is itself a follow-up to PR #1755).	2026-05-09 08:36:38 +03:00
Kawrakow	96127976f2	Use AVX2 when available for greedy speculative sampling (#1761 ) * Use AVX2 when available for greedy speculative sampling * Avoid some code duplication	2026-05-09 08:32:20 +03:00
Kawrakow	2f0b47c19d	Use async copies to save/restore recurrent state (#1759 )	2026-05-09 08:31:56 +03:00
Kawrakow	9f60de9cc5	Fix discarding tokens from the KV cache during MTP drafting (#1757 )	2026-05-09 08:31:25 +03:00
Alex	98950267c6	ggml : default GGML_WIN_VER to 0x0A00 (Windows 10) (#1755 ) The default of 0x602 (Windows 8) causes a build failure on any toolchain where _WIN32_WINNT propagates into vendored cpp-httplib (notably MinGW with the bundled w64devkit GCC). cpp-httplib's httplib.h has, for some time now, contained: #ifdef _WIN32 #if defined(_WIN32_WINNT) && _WIN32_WINNT < 0x0A00 #error "cpp-httplib doesn't support Windows 8 or lower. Please use Windows 10 or later." #endif #endif so the entire llama-server target fails to compile on Windows + MinGW unless the user passes -DGGML_WIN_VER=0x0A00 manually. Bumping the default to 0x0A00 (Windows 10) keeps Windows 8 reachable for anyone who explicitly requests it (-DGGML_WIN_VER=0x602) while letting the default Windows + MinGW build succeed end-to-end. Windows 8 / 8.1 reached end of support in January 2023, and Windows 10 is a strict superset of the Win8 surface used elsewhere (PrefetchVirtualMemory etc.), so this is strictly additive on the API side. Verified by building with w64devkit 2.8.0 (gcc 16.1.0) on Windows 11 without any -DGGML_WIN_VER override: all 266 ninja targets link cleanly, including bin/llama-server.exe, and llama-cli runs Qwen3-4B-Thinking-2507 IQ4_XS at ~6.2 tok/s with q8_0 KV at 4096 context.	2026-05-08 13:23:04 +03:00
joelfarthing	9a26522af2	qwen35moe : support MTP tail layer (#1745 ) Co-authored-by: Joel Farthing <262452229+joelfarthing@users.noreply.github.com>	2026-05-07 15:46:41 +03:00
Zhekun Hu	9ddb510787	Add Turing and Ampere (A100) GGML to docker build file (#1691 ) * Add Turing and Ampere (A100) GGML to docker build file At the moment, the docker file for image builds do not build for CUDA architectures below 8.6, and ik_llama.cpp specifies support for architectures Turing and above, this PR sets the CUDA architecture list to include the architecture for Turing (7.5) and A100 (8.0) * Remove 80 because few ppl have A100s and it does seem like many cuda arches cause issues for build * switch to 86-real and 89-real with 75, 80, 90 using virtual ptx jit * nvm, even adding 90-virtual causes linker error --------- Co-authored-by: Codex <codex@local>	2026-05-07 12:58:58 +03:00

... 2 3 4 5 6 ...

4666 Commits