Oliver Simons 1ec44d178d
CUDA: Various fixes to cpy.cu (#25000)
* Add failing test-case to test-backend-ops

Extracted from https://github.com/ggml-org/llama.cpp/issues/24072

* Minimize repro with help of AI

N = 8 * (65535 - 1) + 1 = 524273

* Port and adjust workaround from 0ba798341e

Fall-back should share code, also relax y-z constraint to be inclusive

* Add test-case + fallback also for y dim

* Fix x-guards which is 2^{31}-1, so inlusive of INT_MAX

* Fix overflow problems for transposed copy kernel
2026-06-25 17:29:23 +02:00
..