6 Commits

Author SHA1 Message Date
Xuan-Son Nguyen
060ce1bf72
mtmd: refactor llava-uhd overview image handling (always use ov_img_first) (#24769)
* add dedicated "overview" for mtmd_image_preproc_out

* corrections

* correct (again)

* nits

* nits (2)
2026-06-18 18:53:49 +02:00
Xuan-Son Nguyen
24bba7b98e
mtmd: refactor preprocessor, add mtmd_image_preproc_out (#24736)
* add mtmd_image_preproc_out

* add dev docs

* remove unused clip API

* rm unused clip_image_f32_batch::grid

* change preprocess() call signature
2026-06-18 12:04:39 +02:00
Xuan-Son Nguyen
f3e1828164
mtmd: llava_uhd should no longer use batch dim (#24732) 2026-06-17 22:40:50 +02:00
Saba Fallah
da3f990a47
mtmd: Add DeepSeekOCR 2 Support (#20975)
* mtmd: DeepSeek-OCR 2 support, with multi-tile dynamic resolution

* introduced clip_image_f32::add_viewsep

* address PR review

- drop redundant ggml_cpy ops in both deepseekocr versions build
- drop no-op ggml_cont in build_sam
- assert num_image_tokens deepseekocr2
- view_seperator as (1, n_embd) at conversion (for both versions)
- drop redundant ggml_reshape_2d

* Update tools/mtmd/models/deepseekocr2.cpp

Co-authored-by: Xuan-Son Nguyen <thichthat@gmail.com>

---------

Co-authored-by: Xuan-Son Nguyen <thichthat@gmail.com>
2026-05-29 16:13:51 +02:00
forforever73
09343c0198
model : support step3-vl-10b (#21287)
* feat: support step3-vl-10b

* use fused QKV && mapping tensor in tensor_mapping.py

* guard hardcoded params and drop crop metadata

* get understand_projector_stride from global config

* img_u8_resize_bilinear_to_f32 move in step3vl class

* Apply suggestions from code review

Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>

* fix the \r\n mess

* add width and heads to MmprojModel.set_gguf_parameters

---------

Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>
2026-04-08 09:51:31 +02:00
Xuan-Son Nguyen
a73bbd5d92
mtmd: refactor image preprocessing (#21031)
* mtmd: refactor image pre-processing

* correct some places

* correct lfm2

* fix deepseek-ocr on server

* add comment to clarify about mtmd_image_preprocessor_dyn_size
2026-03-26 19:49:20 +01:00