Add `-tm` / `--threads-mtmd` to control CPU thread count used during
multimodal image/audio processing (mmproj encoding), separate from the
main LLM thread count.
This allows running the LLM on GPU with minimal CPU threads (e.g. `-t 1`)
to reduce sync overhead, while using many threads (e.g. `-tm 16`) for
CPU-bound mmproj encoding with `--no-mmproj-offload`.
Fallback chain when `-tm` is not specified:
1. `--threads-batch` (-tb) — multimodal encoding is a batch/prefill-like
operation, so it makes sense to track with batch thread count
2. `--threads` (-t) — final default
Works with both mtmd-cli and llama-server.
AI: ubergarm/Qwen3.6-27B-GGUF MTP IQ4_KS 15.113 GiB (4.752 BPW) + pi.dev