ik_llama.cpp

mirror of https://github.com/ikawrakow/ik_llama.cpp.git synced 2026-06-28 04:30:15 -05:00

Author	SHA1	Message	Date
Henrik Berglund	75f0ab300e	Update repository clone instructions in build.md (#1753 )	2026-05-07 12:57:06 +03:00
Andrew Moryakov	485c431b9d	docs : restructure AVX-512 build flags section, recommend GGML_AVX512_=ON first (#1733 ) Per @ikawrakow follow-up suggestion in #1729 to "offer the original version at the beginning and note that in case that does not work, they can use GGML_ARCH_FLAGS in that way". Restructured the docs/build.md AVX-512 section so that the recommended high-level CMake options come first, with GGML_ARCH_FLAGS as the fallback for cases where the high-level options don't propagate the necessary macros (older MSVC, ARM cross-compile, exotic toolchains). Empirical confirmation that GGML_AVX512_=ON activates HAVE_FANCY_SIMD: on MSVC 2022, the resulting compile line (read from build/.../flags.make) contains both `/arch:AVX512` (from GGML_AVX512=ON) and explicit `-D__AVX512VNNI__` / `-D__AVX512VBMI__` / `-D__AVX512BF16__` (added by the matching GGML_AVX512_*=ON options via add_compile_definitions(...) at ggml/src/CMakeLists.txt:1361-1372). The runtime banner prints `HAVE_FANCY_SIMD is defined` and `system_info: AVX512_VNNI = 1`. Also added a brief note about the separate HAVE_VNNI256 gate in iqk_config.h:52-54, which gives meaningful speedups on AVX2-only CPUs with the VNNI extension (some Alder/Raptor Lake parts). Documentation only — no code changes.	2026-05-04 15:32:38 +03:00
Andrew Moryakov	418d60a909	docs : add AVX-512 build flags reference for Zen4 / Sapphire Rapids+ (#1729 ) The IQK quantized GEMM kernels (ggml/src/iqk/iqk_gemm_.cpp) are gated by HAVE_FANCY_SIMD in iqk_config.h, which requires five AVX-512 macros to be defined: __AVX512F__, __AVX512VNNI__, __AVX512VL__, __AVX512BW__, __AVX512DQ__. If they are not defined, the AVX-512 quantized matmul path is skipped silently — no build warning, no runtime symptom, just lower performance than the hardware can deliver. Surprises users on Windows/MSVC where -march=native semantics are not propagated. Adds a docs/build.md section that documents: - Which macros gate which path (HAVE_FANCY_SIMD for quant GEMM, __AVX512F__ alone for f16/f32, __AVX512BF16__ for bf16, __AVXVNNI__ for AVX2+VNNI-only CPUs). - Linux/GCC: GGML_NATIVE=ON (default) handles this automatically on Zen4 / Sapphire Rapids; just verify with objdump. - Windows/MSVC and cross-compile: explicit GGML_ARCH_FLAGS with -D__AVX512 defines is required. - Note on Zen4 implementing AVX-512 as 256-bit double-pumped. Documentation only — no code changes, no behavioural changes, no new CMake options introduced.	2026-05-03 17:35:01 +03:00
Kawrakow	666ea0e983	Revise build instructions for ik_llama.cpp Updated documentation to reflect changes from 'llama.cpp' to 'ik_llama.cpp' and clarified build instructions.	2026-03-09 11:23:39 +01:00
mullecofo	f67fd9a452	Update README.md with build instructions for Windows (#1372 ) * Fix compilation on clang-cl.exe Fixes https://github.com/ikawrakow/ik_llama.cpp/issues/1169 See bitwise ariphmetics here: https://clang.llvm.org/doxygen/avx512fintrin_8h_source.html Clang (and GCC) supports a language feature called Vector Extensions. To Clang, `__m512i` is not just a "struct" or a "bag of bits"; it is recognized by the compiler as a native vector type. Because it is a native vector type, Clang automatically maps standard C operators to the corresponding hardware instructions. When you write `a \| b`, Clang sees that a and b are 512-bit integer vectors. It implicitly understands that the bitwise OR operator (\|) applies to these vectors. It automatically generates the VPORQ (or VPORD) instruction without needing any helper function. MSVC follows a stricter, more traditional C++ model regarding intrinsics. In MSVC, __m512i is defined in the header files (<immintrin.h>) as a struct or union (e.g., typedef struct __m512i { ... } __m512i). To the MSVC compiler, it is essentially a user-defined data type, not a fundamental language primitive like int or float. Standard C++ does not define what `\|` means for a user-defined struct. MSVC does not have the same "Vector Extensions" that automatically apply operators to these structs. When you write `a \| b` in MSVC, the compiler looks for a definition of `operator\|` for the __m512i struct. Since the standard headers don't provide one, the compiler throws an error. You must use the explicit intrinsic function provided by Intel/MSVC: _mm512_or_si512(a, b). To get the nice syntax `(a \| b)` in MSVC, you have to manually "teach" the compiler what `\|` means by defining the `operator\|` overload yourself. * Update README.md with build instructions for Windows Current README lacks any guide for Windows users, whereas build process on that platform is quite compicated * Update build.md with instruction about clang-cl.exe Brings step-by-step build instruction for Windows * Apply suggestions from code review Co-authored-by: Kawrakow <iwankawrakow@gmail.com> * Polish build.md for Windows usage Added example of use for Windows * Apply suggestions from code review --------- Co-authored-by: Kawrakow <iwankawrakow@gmail.com>	2026-03-09 11:17:26 +01:00
Kawrakow	1a4cfbcc53	Merge mainline - Aug 12 2024 (#17 ) * Merge mainline * Fix after merge * Remove CI check --------- Co-authored-by: Iwan Kawrakow <iwan.kawrakow@gmail.com>	2024-08-12 15:14:32 +02:00
Kawrakow	0ceeb11721	Merge mainline llama.cpp (#3 ) * Merging mainline - WIP * Merging mainline - WIP AVX2 and CUDA appear to work. CUDA performance seems slightly (~1-2%) lower as it is so often the case with llama.cpp/ggml after some "improvements" have been made. * Merging mainline - fix Metal * Remove check --------- Co-authored-by: Iwan Kawrakow <iwan.kawrakow@gmail.com>	2024-07-27 07:55:01 +02:00

7 Commits