Andreas Kieslinger 3fc4e10527
sched : reintroduce less synchronizations during split compute (#20793)
* CUDA:  Improve performance via less synchronizations between token (#17795)

* Adds CPU-to-CUDA copy capability to
ggml_backend_cuda_cpy_tensor_async()

* Adds function to relax sync requirements between input copies on
supported backends (CUDA for now)

* Exchanges synchronous copy with async copy function.

* Adds macro guards to allow compilation in non-CUDA builds

* Reworked backend detection in ggml-backend.cpp to avoid linking
conflicts

* Relax requirement of checks in async CUDA copies from backend and buffer type to just buffer type, to avoid linking issues

* Minor cleanup

* Makes opt-in to relax use of explicit syncs more general. Backends like
vulkan which require a synchronization between HtoD copies and graph
execution could also adopt this change now.

* Reintroduces stricter check for CPU->CUDA backend async copy via
GGML_DEVICE_TYPE_CPU.

* Corrects initialization of ggml_backend_sync_mode in
ggml_backend_sched_split initialization

* Simplifies synchronizations to adhere to `saaasg` pattern.

* Apply suggestion from @ggerganov (src->buffer to buf_src)

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>

* Apply suggestion from @ggerganov (src->buffer to buf_src) v2

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>

---------

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>

* Apply suggestions from @johannesgaessler code review

Co-authored-by: Johannes Gäßler <johannesg@5d6.de>

* Adds single-GPU synchronizations to multi-GPU settings to fix hip backend pipeline parallel bugs.

* Scheduler Hardening: Exclude hip/MUSA from copy_from_host CPU split ->
GPU split optimization

* Scheduler Hardening: Re-adding original additional synchronizations for
non-async backends

* Adds disclaimer to hip/musa exclusion of copy_from_host. Highlights that it is out of
precaution, but that no perf-impact is visible, and that it can be
revisited separately anytime.

---------

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
Co-authored-by: Johannes Gäßler <johannesg@5d6.de>
2026-06-26 17:18:30 +03:00
..
2024-07-13 18:12:39 +02:00