mirror of
https://github.com/ikawrakow/ik_llama.cpp.git
synced 2026-06-28 04:30:15 -05:00
Fix two speculative-decoding crashes that prevent any usage (#1760)
This patch addresses two latent bugs in examples/speculative/speculative.cpp
that prevent llama-speculative.exe from running on greedy sampling
(temp=0) or producing rejection-sampling output (temp>0):
1. Line 191: `params.sparams.grammar = { COMMON_GRAMMAR_TYPE_NONE, "" };`
invokes `common_grammar(type, grammar)` which asserts
`type != NONE || !grammar.empty()`. Both conditions fail with the
intended-to-be-empty grammar, so every speculative run hits a hard
`GGML_ASSERT` in common/sampling.h:63 immediately after model load.
Fix: default-construct via `common_grammar{}` to bypass the
field-init constructor.
2. Lines 293-294: `GGML_ASSERT(dist_tgt.sorted)` and
`GGML_ASSERT(dist_dft.sorted)` fire whenever the draft sampler does
not set the .sorted flag (which is most modern sampler paths).
Comment them out — the next ~10 lines re-sort both distributions
by id explicitly, so the assertion is incorrect anyway.
Fix: replace the asserts with an explanatory comment.
After both fixes, `llama-speculative.exe` runs to completion. The
acceptance-rate measurement at temp=0 still looks suspicious (0%
across same-family draft/target pairs), but that is a different
issue out of scope for this PR.
Tested on Qwen3-0.6B-IQ4_XS drafting Qwen3-1.7B-IQ4_XS, both base
models from `bartowski/Qwen_Qwen3-*-GGUF` on Windows + ik_llama.cpp
build at HEAD of windows-mingw-default-win10 (which is itself a
follow-up to PR #1755).
This commit is contained in:
parent
96127976f2
commit
51331f4973
@ -188,7 +188,7 @@ int main(int argc, char ** argv) {
|
||||
// draft sequence data
|
||||
std::vector<seq_draft> drafts(n_seq_dft);
|
||||
|
||||
params.sparams.grammar = { COMMON_GRAMMAR_TYPE_NONE, ""}; // the draft samplers will copy the target sampler's grammar
|
||||
params.sparams.grammar = common_grammar{}; // the draft samplers will copy the target sampler's grammar
|
||||
if (params.sparams.temp == 0) {
|
||||
params.sparams.temp = -1.0f; // force greedy sampling with probs for the draft model
|
||||
}
|
||||
@ -290,8 +290,8 @@ int main(int argc, char ** argv) {
|
||||
drafts[s].active = false;
|
||||
|
||||
// calculate residual probability
|
||||
GGML_ASSERT(dist_tgt.sorted);
|
||||
GGML_ASSERT(dist_dft.sorted);
|
||||
// (the .sorted flag is unreliable across modern sampling
|
||||
// paths; we re-sort below regardless, so it doesn't matter.)
|
||||
float sum_probs = 0.0f;
|
||||
|
||||
// sort dist by id
|
||||
|
||||
Loading…
x
Reference in New Issue
Block a user