From 51331f49737240ec2582d4c1cede8bcf104402eb Mon Sep 17 00:00:00 2001
From: Alex <invertedinkuniverse@proton.me>
Date: Sat, 9 May 2026 01:36:38 -0400
Subject: [PATCH] Fix two speculative-decoding crashes that prevent any usage
 (#1760)
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

This patch addresses two latent bugs in examples/speculative/speculative.cpp
that prevent llama-speculative.exe from running on greedy sampling
(temp=0) or producing rejection-sampling output (temp>0):

1. Line 191: `params.sparams.grammar = { COMMON_GRAMMAR_TYPE_NONE, "" };`
   invokes `common_grammar(type, grammar)` which asserts
   `type != NONE || !grammar.empty()`. Both conditions fail with the
   intended-to-be-empty grammar, so every speculative run hits a hard
   `GGML_ASSERT` in common/sampling.h:63 immediately after model load.

   Fix: default-construct via `common_grammar{}` to bypass the
   field-init constructor.

2. Lines 293-294: `GGML_ASSERT(dist_tgt.sorted)` and
   `GGML_ASSERT(dist_dft.sorted)` fire whenever the draft sampler does
   not set the .sorted flag (which is most modern sampler paths).
   Comment them out — the next ~10 lines re-sort both distributions
   by id explicitly, so the assertion is incorrect anyway.

   Fix: replace the asserts with an explanatory comment.

After both fixes, `llama-speculative.exe` runs to completion. The
acceptance-rate measurement at temp=0 still looks suspicious (0%
across same-family draft/target pairs), but that is a different
issue out of scope for this PR.

Tested on Qwen3-0.6B-IQ4_XS drafting Qwen3-1.7B-IQ4_XS, both base
models from `bartowski/Qwen_Qwen3-*-GGUF` on Windows + ik_llama.cpp
build at HEAD of windows-mingw-default-win10 (which is itself a
follow-up to PR #1755).
---
 examples/speculative/speculative.cpp | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/examples/speculative/speculative.cpp b/examples/speculative/speculative.cpp
index 663e0420..714a989a 100644
--- a/examples/speculative/speculative.cpp
+++ b/examples/speculative/speculative.cpp
@@ -188,7 +188,7 @@ int main(int argc, char ** argv) {
     // draft sequence data
     std::vector<seq_draft> drafts(n_seq_dft);
 
-    params.sparams.grammar = { COMMON_GRAMMAR_TYPE_NONE, ""}; // the draft samplers will copy the target sampler's grammar
+    params.sparams.grammar = common_grammar{}; // the draft samplers will copy the target sampler's grammar
     if (params.sparams.temp == 0) {
         params.sparams.temp = -1.0f; // force greedy sampling with probs for the draft model
     }
@@ -290,8 +290,8 @@ int main(int argc, char ** argv) {
                             drafts[s].active = false;
 
                             // calculate residual probability
-                            GGML_ASSERT(dist_tgt.sorted);
-                            GGML_ASSERT(dist_dft.sorted);
+                            // (the .sorted flag is unreliable across modern sampling
+                            //  paths; we re-sort below regardless, so it doesn't matter.)
                             float sum_probs = 0.0f;
 
                             // sort dist by id