* Add failing test-case to test-backend-ops
Extracted from https://github.com/ggml-org/llama.cpp/issues/24072
* Minimize repro with help of AI
N = 8 * (65535 - 1) + 1 = 524273
* Port and adjust workaround from 0ba798341e
Fall-back should share code, also relax y-z constraint to be inclusive
* Add test-case + fallback also for y dim
* Fix x-guards which is 2^{31}-1, so inlusive of INT_MAX
* Fix overflow problems for transposed copy kernel