ik_llama.cpp

mirror of https://github.com/ikawrakow/ik_llama.cpp.git synced 2026-06-28 04:30:15 -05:00

Author	SHA1	Message	Date
Yadir Hernandez Batista	8df5cbc0b3	Fix Build and Push Container Image (#1633 ) * fix: Updating cleanup step * fix: Updated trigger for build-container * fix: Unset token for cleanup step * fix: Set build-container cleanup step without run-dry * fix: Removed 100 commits from checkout actions * fix: Enable the whole history * test: Suggestion from mcm007 on build-container * fix: add token to package cleanup step Explicitly pass GITHUB_TOKEN to the delete-package-versions action to ensure it has sufficient authorization for package deletion. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: - Error: delete version API failed. Package not found. * fix: Deleted LLAMA_COMMIT from build * fix: Removed id-token permission --------- Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>	2026-04-16 11:24:19 +02:00
Kawrakow	2c455ec468	Change container build action to manual dispatch	2026-04-11 06:10:08 +00:00
Yadir Hernandez Batista	7f4d106d25	Fix for build-container (#1609 ) * ci: implement build matrix for CUDA/CPU containers with dynamic tagging * fix: Updated Docker images/build-container.yml * fix: Updated the documentation about Docker * fix: Set Arch for 3090s * fix: Updated build step name. * fix: Set target ARCH as a variable * feat: Added cleanup step * feat: Added docker-bake and updated workflow * fix: Issue with REPO_OWNER variable * fix: Updated workflow to solve errors * fix: Updated branch format * fix: Wrong naming * Update docker-bake.hcl * Update build-container.yml * Update ik_llama-cuda.Containerfile * Update ik_llama-cpu.Containerfile * Update docker-bake.hcl * Update build-container.yml * Removed action/cache * added -sSL for reliability and fixed the URL path * added -sSL for reliability and fixed the URL path CUDA containerfile * fix: correct Dockerfile RUN command syntax errors - Combine split apt-get install commands in both Containerfiles - Fix broken cmake command continuation in ik_llama-cuda.Containerfile * fix: correct llama-swap download URL in Containerfiles - Fix broken line continuation in curl download URL for llama-swap * perf: improve ccache configuration in Containerfiles - Add CCACHE_UMASK=000 for cache accessibility across stages - Add CCACHE_MAXSIZE=1G to prevent unbounded growth - Initialize ccache with ccache -i during build stage * fix: remove problematic ccache initialization from Containerfiles - ccache -i fails because CCACHE_DIR mount doesn't exist yet at build time * fix: add git to CPU Containerfile build dependencies - Resolves CMake warning about missing Git for build info * chore: optimize Containerfile with smaller images and better healthchecks - Add --no-install-recommends to all apt-get commands for smaller image size - Add ca-certificates to base stage for HTTPS support - Merge redundant build copy commands from 3 layers to 1 - Fix llama-swap version from 198 to v199 (latest release) - Add HEALTHCHECK configuration with interval/timeout/retries to server and swap stages - Copy /app/lib in server stage to fix container startup * chore: fix CUDA Containerfile healthchecks and swap version - Add /app/lib copy in server stage to fix container startup - Fix llama-swap version from 198 to v199 (latest release) - Add HEALTHCHECK configuration with interval/timeout/retries * chore: fix indentation in Containerfiles and add LD_LIBRARY_PATH for server target * fix: add --break-system-packages flag for pip in CPU Containerfile * feat: add git bind mount for build info and NCCL support for CUDA * fix: remove libnccl-dev from CUDA build (already included in base image) * fix: added Markdown files to ignore files * feat: use BUILD_NUMBER-COMMIT pattern for docker image tags - Add BUILD_NUMBER and LLAMA_COMMIT to build workflow - Update docker-bake.hcl to use version tag format matching llama-server --version output - Format: VARIANT-BUILD_NUMBER-COMMIT (e.g., cu12-full-4406-3bc90dfd) * fix: fetch full git history for accurate BUILD_NUMBER - Add fetch-depth: 0 to actions/checkout to get all commits - This ensures git rev-list --count HEAD returns correct total commit count * fix: fetch full git history in Dockerfile for accurate BUILD_NUMBER - Add git fetch --unshallow to get complete commit history during build - This ensures build-info.cpp is generated with correct LLAMA_BUILD_NUMBER * chore: update GitHub Actions to latest versions for Node.js 24 compatibility - docker/setup-buildx-action@v3 -> v4 - docker/login-action@v3 -> v4 * chore: update all GitHub Actions to Node.js 24 compatible versions - actions/checkout@v4 -> v6 - docker/setup-buildx-action@v3 -> v5 - docker/login-action@v3 -> v6 - docker/bake-action@v5 -> v7 * fix: use CI-passed BUILD_NUMBER and LLAMA_COMMIT in Dockerfile - Add BUILD_NUMBER and LLAMA_COMMIT as build args - Fall back to git commands if not provided - Pass values explicitly to cmake for accurate build info * fix: pass BUILD_NUMBER and LLAMA_COMMIT as Docker build args - Add BUILD_NUMBER and LLAMA_COMMIT to docker bake args - These will be used by the Containerfile for accurate build info * fix: revert docker actions to v4 (latest available versions) * fix: calculate BUILD_NUMBER and LLAMA_COMMIT directly in Containerfile - Removed ARG defaults since we calculate from git during build - Use git rev-list --count HEAD and git rev-parse for accurate version info - Falls back to 0/unknown if git commands fail * feat: calculate BUILD_NUMBER and LLAMA_COMMIT in Containerfiles - Add git-based version calculation in both CPU and CUDA Containerfiles - Remove .git bind mount (git is copied with COPY .) - Pass build info to CMake for accurate llama-server --version output * feat: calculate BUILD_NUMBER and LLAMA_COMMIT in Containerfiles - Add git-based version calculation using git rev-list and git rev-parse - Copy .git directory separately to ensure git commands work during build - Pass build info to CMake for accurate llama-server --version output * fix: cache improvements for CUDA and CPU builds * fix: "/.git": not found * fix: Unnecessary mv llama-swap * fix: Remove BUILD_NUMBER and LLAMA_COMMIT from docker file, calculated by cmake proc * fix: remove .git from dockerignore for local and CI builds - Enables cmake to access .git directory during Docker build - Required for version calculation in llama-server binary - GitHub Actions uses explicit mount via bake action set parameter * fix: Remove mounts key from Build and Push step in gh workflow * ci: add .git verification step before build * refactor: standardize Containerfile structure and remove .git mount dependency - Remove --mount=type=bind,source=.git,target=.git from both Containerfiles - Replace COPY . . with git clone for cleaner build context - Add CUSTOM_COMMIT ARG for optional custom commit switching - Standardize ARG/ENV ordering and comment formatting across CPU/CUDA variants - Install ca-certificates before git clone to fix SSL verification issues - Rename 'Structured artifact collection' to 'Collect build artifacts' * ci: remove broken cache pruning step * ci: remove broken prune-cache job - Remove prune-cache job that was failing due to missing .git directory - The job required a checkout step and the cache pruning logic was non-critical * chore: Removed step for Verifying .git existance in GH workflow * fix: ensure build always proceeds even if git switch fails - Add '\|\| true' to git switch command so build continues on failure - This prevents the entire RUN step from failing when CUSTOM_COMMIT is invalid * fix: resolve Docker build pipeline issues - Remove external git clone from Containerfiles, use build context directly - Add BUILD_NUMBER and BUILD_COMMIT as CMake cache variables in build-info.cmake - Fix .devops/tools.sh inclusion by using explicit COPY for hidden directories - Set USE_CCACHE=true for CI builds - Clean up unused SHA_SHORT variable from docker-bake.hcl Fixes: Build steps were cached incorrectly due to external git clone ignoring the actual build context source. * fix: include .git in Docker build context and add verification * ci: add .git directory verification step after checkout * build: fix .git mount path for Docker build context compatibility * build: fix .git mount path for Docker build context compatibility * docker: include .git in build context for version calculation * ci: add .git directory verification step after checkout * chore: Removed unecessary Verify .git step (It was a test) * docs: update README with docker-bake and build-local.sh instructions * docs: remove build-local.sh reference (not in repo) * ci: optimize disk usage by limiting fetch depth and cleaning workspace * fix: cleanup step in workflow --------- Co-authored-by: HP Prodesk <sourceupdev@gmail.com>	2026-04-10 18:03:10 +02:00
Yadir Hernandez Batista	db31e7d803	Added workflow to build container images (#1279 ) * ci: implement build matrix for CUDA/CPU containers with dynamic tagging * fix: Updated Docker images/build-container.yml * fix: Updated the documentation about Docker * fix: Set Arch for 3090s * fix: Updated build step name. * fix: Set target ARCH as a variable * feat: Added cleanup step * feat: Added docker-bake and updated workflow * fix: Issue with REPO_OWNER variable * fix: Updated workflow to solve errors * fix: Updated branch format * fix: Wrong naming * Update docker-bake.hcl * Update build-container.yml * Update ik_llama-cuda.Containerfile * Update ik_llama-cpu.Containerfile * Update docker-bake.hcl * Update build-container.yml * Removed action/cache * added -sSL for reliability and fixed the URL path * added -sSL for reliability and fixed the URL path CUDA containerfile * fix: correct Dockerfile RUN command syntax errors - Combine split apt-get install commands in both Containerfiles - Fix broken cmake command continuation in ik_llama-cuda.Containerfile * fix: correct llama-swap download URL in Containerfiles - Fix broken line continuation in curl download URL for llama-swap * perf: improve ccache configuration in Containerfiles - Add CCACHE_UMASK=000 for cache accessibility across stages - Add CCACHE_MAXSIZE=1G to prevent unbounded growth - Initialize ccache with ccache -i during build stage * fix: remove problematic ccache initialization from Containerfiles - ccache -i fails because CCACHE_DIR mount doesn't exist yet at build time * fix: add git to CPU Containerfile build dependencies - Resolves CMake warning about missing Git for build info * chore: optimize Containerfile with smaller images and better healthchecks - Add --no-install-recommends to all apt-get commands for smaller image size - Add ca-certificates to base stage for HTTPS support - Merge redundant build copy commands from 3 layers to 1 - Fix llama-swap version from 198 to v199 (latest release) - Add HEALTHCHECK configuration with interval/timeout/retries to server and swap stages - Copy /app/lib in server stage to fix container startup * chore: fix CUDA Containerfile healthchecks and swap version - Add /app/lib copy in server stage to fix container startup - Fix llama-swap version from 198 to v199 (latest release) - Add HEALTHCHECK configuration with interval/timeout/retries * chore: fix indentation in Containerfiles and add LD_LIBRARY_PATH for server target * fix: add --break-system-packages flag for pip in CPU Containerfile * feat: add git bind mount for build info and NCCL support for CUDA * fix: remove libnccl-dev from CUDA build (already included in base image) * fix: added Markdown files to ignore files * feat: use BUILD_NUMBER-COMMIT pattern for docker image tags - Add BUILD_NUMBER and LLAMA_COMMIT to build workflow - Update docker-bake.hcl to use version tag format matching llama-server --version output - Format: VARIANT-BUILD_NUMBER-COMMIT (e.g., cu12-full-4406-3bc90dfd) * fix: fetch full git history for accurate BUILD_NUMBER - Add fetch-depth: 0 to actions/checkout to get all commits - This ensures git rev-list --count HEAD returns correct total commit count * fix: fetch full git history in Dockerfile for accurate BUILD_NUMBER - Add git fetch --unshallow to get complete commit history during build - This ensures build-info.cpp is generated with correct LLAMA_BUILD_NUMBER * chore: update GitHub Actions to latest versions for Node.js 24 compatibility - docker/setup-buildx-action@v3 -> v4 - docker/login-action@v3 -> v4 * chore: update all GitHub Actions to Node.js 24 compatible versions - actions/checkout@v4 -> v6 - docker/setup-buildx-action@v3 -> v5 - docker/login-action@v3 -> v6 - docker/bake-action@v5 -> v7 * fix: use CI-passed BUILD_NUMBER and LLAMA_COMMIT in Dockerfile - Add BUILD_NUMBER and LLAMA_COMMIT as build args - Fall back to git commands if not provided - Pass values explicitly to cmake for accurate build info * fix: pass BUILD_NUMBER and LLAMA_COMMIT as Docker build args - Add BUILD_NUMBER and LLAMA_COMMIT to docker bake args - These will be used by the Containerfile for accurate build info * fix: revert docker actions to v4 (latest available versions) * fix: calculate BUILD_NUMBER and LLAMA_COMMIT directly in Containerfile - Removed ARG defaults since we calculate from git during build - Use git rev-list --count HEAD and git rev-parse for accurate version info - Falls back to 0/unknown if git commands fail * feat: calculate BUILD_NUMBER and LLAMA_COMMIT in Containerfiles - Add git-based version calculation in both CPU and CUDA Containerfiles - Remove .git bind mount (git is copied with COPY .) - Pass build info to CMake for accurate llama-server --version output * feat: calculate BUILD_NUMBER and LLAMA_COMMIT in Containerfiles - Add git-based version calculation using git rev-list and git rev-parse - Copy .git directory separately to ensure git commands work during build - Pass build info to CMake for accurate llama-server --version output * fix: cache improvements for CUDA and CPU builds * fix: "/.git": not found * fix: Unnecessary mv llama-swap * fix: Remove BUILD_NUMBER and LLAMA_COMMIT from docker file, calculated by cmake proc * fix: remove .git from dockerignore for local and CI builds - Enables cmake to access .git directory during Docker build - Required for version calculation in llama-server binary - GitHub Actions uses explicit mount via bake action set parameter * fix: Remove mounts key from Build and Push step in gh workflow * ci: add .git verification step before build * refactor: standardize Containerfile structure and remove .git mount dependency - Remove --mount=type=bind,source=.git,target=.git from both Containerfiles - Replace COPY . . with git clone for cleaner build context - Add CUSTOM_COMMIT ARG for optional custom commit switching - Standardize ARG/ENV ordering and comment formatting across CPU/CUDA variants - Install ca-certificates before git clone to fix SSL verification issues - Rename 'Structured artifact collection' to 'Collect build artifacts' * ci: remove broken cache pruning step * ci: remove broken prune-cache job - Remove prune-cache job that was failing due to missing .git directory - The job required a checkout step and the cache pruning logic was non-critical * chore: Removed step for Verifying .git existance in GH workflow * fix: ensure build always proceeds even if git switch fails - Add '\|\| true' to git switch command so build continues on failure - This prevents the entire RUN step from failing when CUSTOM_COMMIT is invalid * fix: resolve Docker build pipeline issues - Remove external git clone from Containerfiles, use build context directly - Add BUILD_NUMBER and BUILD_COMMIT as CMake cache variables in build-info.cmake - Fix .devops/tools.sh inclusion by using explicit COPY for hidden directories - Set USE_CCACHE=true for CI builds - Clean up unused SHA_SHORT variable from docker-bake.hcl Fixes: Build steps were cached incorrectly due to external git clone ignoring the actual build context source. * fix: include .git in Docker build context and add verification * ci: add .git directory verification step after checkout * build: fix .git mount path for Docker build context compatibility * build: fix .git mount path for Docker build context compatibility * docker: include .git in build context for version calculation * ci: add .git directory verification step after checkout * chore: Removed unecessary Verify .git step (It was a test) * docs: update README with docker-bake and build-local.sh instructions * docs: remove build-local.sh reference (not in repo) * ci: optimize disk usage by limiting fetch depth and cleaning workspace --------- Co-authored-by: HP Prodesk <sourceupdev@gmail.com>	2026-04-10 08:06:47 +02:00
Kawrakow	0ceeb11721	Merge mainline llama.cpp (#3 ) * Merging mainline - WIP * Merging mainline - WIP AVX2 and CUDA appear to work. CUDA performance seems slightly (~1-2%) lower as it is so often the case with llama.cpp/ggml after some "improvements" have been made. * Merging mainline - fix Metal * Remove check --------- Co-authored-by: Iwan Kawrakow <iwan.kawrakow@gmail.com>	2024-07-27 07:55:01 +02:00
Kawrakow	43f4c58376	Remove all workflows	2024-06-27 09:45:56 +03:00
slaren	028d6b31c6	ggml : synchronize threads using barriers (#7993 )	2024-06-19 15:04:15 +02:00
Georgi Gerganov	efc3d09e43	codecov : remove (#8004 )	2024-06-19 13:04:36 +03:00
Georgi Gerganov	0a673baa03	github : update pr template	2024-06-16 10:46:51 +03:00
olexiyb	7006c85155	ci : fix macos x86 build (#7940 ) In order to use old `macos-latest` we should use `macos-12` Potentially will fix: https://github.com/ggerganov/llama.cpp/issues/6975	2024-06-14 20:28:34 +03:00
Olivier Chafik	b267b997c5	`build`: rename main → llama-cli, server → llama-server, llava-cli → llama-llava-cli, etc... (#7809 ) * `main`/`server`: rename to `llama` / `llama-server` for consistency w/ homebrew * server: update refs -> llama-server gitignore llama-server * server: simplify nix package * main: update refs -> llama fix examples/main ref * main/server: fix targets * update more names * Update build.yml * rm accidentally checked in bins * update straggling refs * Update .gitignore * Update server-llm.sh * main: target name -> llama-cli * Prefix all example bins w/ llama- * fix main refs * rename {main->llama}-cmake-pkg binary * prefix more cmake targets w/ llama- * add/fix gbnf-validator subfolder to cmake * sort cmake example subdirs * rm bin files * fix llama-lookup-* Makefile rules * gitignore /llama-* * rename Dockerfiles * rename llama\|main -> llama-cli; consistent RPM bin prefixes * fix some missing -cli suffixes * rename dockerfile w/ llama-cli * rename(make): llama-baby-llama * update dockerfile refs * more llama-cli(.exe) * fix test-eval-callback * rename: llama-cli-cmake-pkg(.exe) * address gbnf-validator unused fread warning (switched to C++ / ifstream) * add two missing llama- prefixes * Updating docs for eval-callback binary to use new `llama-` prefix. * Updating a few lingering doc references for rename of main to llama-cli * Updating `run-with-preset.py` to use new binary names. Updating docs around `perplexity` binary rename. * Updating documentation references for lookup-merge and export-lora * Updating two small `main` references missed earlier in the finetune docs. * Update apps.nix * update grammar/README.md w/ new llama-* names * update llama-rpc-server bin name + doc * Revert "update llama-rpc-server bin name + doc" This reverts commit e474ef1df481fd8936cd7d098e3065d7de378930. * add hot topic notice to README.md * Update README.md * Update README.md * rename gguf-split & quantize bins refs in **/tests.sh --------- Co-authored-by: HanClinto <hanclinto@gmail.com>	2024-06-13 00:41:52 +01:00
Deven Mistry	a08dd44cb8	fix broken link in pr template (#7880 ) [no ci] * fix broken link in pr template * Update pull_request_template.md [no ci] --------- Co-authored-by: Brian <mofosyne@gmail.com>	2024-06-12 02:18:58 +10:00
Brian	ea1bb2b82b	github: move PR template to .github/ root (#7868 )	2024-06-11 17:43:41 +03:00
slaren	e4e6f9abea	fix CUDA CI by using a windows-2019 image (#7861 ) * try to fix CUDA ci with --allow-unsupported-compiler * trigger when build.yml changes * another test * try exllama/bdashore3 method * install vs build tools before cuda toolkit * try win-2019	2024-06-11 08:59:20 +03:00
slaren	27d373a411	ci : try win-2019 on server windows test (#7854 )	2024-06-10 15:18:41 +03:00
Nicolás Pérez	f6bbf78d23	docs: Added initial PR template with directions for doc only changes and squash merges [no ci] (#7700 ) This commit adds pull_request_template.md and CONTRIBUTING.md . It focuses on explaining to contributors the need to rate PR complexity level, when to add [no ci] and how to format PR title and descriptions. Co-authored-by: Brian <mofosyne@gmail.com> Co-authored-by: compilade <git@compilade.net>	2024-06-10 01:24:29 +10:00
Georgi Gerganov	8de006f83e	ggml : remove OpenCL (#7735 ) ggml-ci	2024-06-04 21:23:20 +03:00
Masaya, Kato	2c040f0269	ggml : use OpenMP as a thread pool (#7606 ) * ggml: Added OpenMP for multi-threads processing * ggml : Limit the number of threads used to avoid deadlock * update shared state n_threads in parallel region * clear numa affinity for main thread even with openmp * enable openmp by default * fix msvc build * disable openmp on macos * ci : disable openmp with thread sanitizer * Update ggml.c Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> --------- Co-authored-by: slaren <slarengh@gmail.com> Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>	2024-06-03 17:14:15 +02:00
Brian	0a83967e79	github: add contact links to issues and convert question into research [no ci] (#7612 )	2024-05-30 21:55:36 +10:00
Meng, Hengyu	991a5632cd	[SYCL] fix intel docker (#7630 ) * Update main-intel.Dockerfile * workaround for https://github.com/intel/oneapi-containers/issues/70 * reset intel docker in CI * add missed in server	2024-05-30 16:19:08 +10:00
Brian	d42f92b628	github: add refactor to issue template (#7561 ) * github: add refactor issue template [no ci] * Update 07-refactor.yml	2024-05-28 20:27:27 +10:00
Brian	11a3ec860b	github: add self sorted issue ticket forms (#7543 ) * github: add self sorted issue ticket forms [no ci] * github: consolidate BSD in bug issue ticket * github: remove contact from bug ticket template [no ci] * github: remove bios from os dropdown in bug report [no ci]	2024-05-27 10:54:30 +10:00
Brian	39c7118283	labeler: added Apple Metal detector (+Kompute) (#7529 ) * labeler: added Apple Metal detector [no ci] * labeler: add Kompute to detector [no ci]	2024-05-25 19:30:42 +10:00
Brian	7929f977ac	docker.yml: disable light-intel and server-intel test (#7515 ) * docker.yml: disable light-intel test * docker.yml: disable server-intel test	2024-05-24 23:47:56 +10:00
Brian	75e5b1388c	labeler.yml: add embedding label detector [no ci] (#7482 )	2024-05-23 17:40:43 +10:00
Georgi Gerganov	c37774e1ef	build : remove zig (#7471 )	2024-05-22 20:05:38 +03:00
Georgi Gerganov	31b2d6e05b	server : fix temperature + disable some tests (#7409 ) * server : fix temperature * server : disable tests relying on parallel determinism * ci : change server Debug -> RelWithDebInfo	2024-05-20 22:10:03 +10:00
slaren	cb9cf0fb9b	llama : remove MPI backend (#7395 )	2024-05-20 01:17:03 +02:00
Brian	a846498a4a	labeler.yml: Use settings from ggerganov/llama.cpp [no ci] (#7363 ) https://github.com/actions/labeler#using-configuration-path-input-together-with-the-actionscheckout-action Recommends the use of checkout action to use the correct repo context when applying settings for PR labels e.g. steps: - uses: actions/checkout@v4 # Uploads repository content to the runner with: repository: "owner/repositoryName" # The one of the available inputs, visit https://github.com/actions/checkout#readme to find more - uses: actions/labeler@v5 with: configuration-path: 'path/to/the/uploaded/configuration/file'	2024-05-19 20:51:03 +10:00
Georgi Gerganov	cc5796c0ec	ci : re-enable sanitizer runs (#7358 ) * Revert "ci : temporary disable sanitizer builds (#6128)" This reverts commit 4f6d1337ca5a409dc74aca8c479b7c34408a69c0. * ci : trigger	2024-05-18 18:55:54 +03:00
Brian	85733c54b1	github-actions-labeler: initial commit (#7330 ) * github-actions-labeler: initial commit [no ci] * github actions: remove priority auto labeling [no ci]	2024-05-18 16:04:23 +10:00
Gavin Zhao	37d4f164ef	ROCm: use native CMake HIP support (#5966 ) Supercedes #4024 and #4813. CMake's native HIP support has become the recommended way to add HIP code into a project (see [here](https://rocm.docs.amd.com/en/docs-6.0.0/conceptual/cmake-packages.html#using-hip-in-cmake)). This PR makes the following changes: 1. The environment variable `HIPCXX` or CMake option `CMAKE_HIP_COMPILER` should be used to specify the HIP compiler. Notably this shouldn't be `hipcc`, but ROCm's clang, which usually resides in `$ROCM_PATH/llvm/bin/clang`. Previously this was control by `CMAKE_C_COMPILER` and `CMAKE_CXX_COMPILER`. Note that since native CMake HIP support is not yet available on Windows, on Windows we fall back to the old behavior. 2. CMake option `CMAKE_HIP_ARCHITECTURES` is used to control the GPU architectures to build for. Previously this was controled by `GPU_TARGETS`. 3. Updated the Nix recipe to account for these new changes. 4. The GPU targets to build against in the Nix recipe is now consistent with the supported GPU targets in nixpkgs. 5. Added CI checks for HIP on both Linux and Windows. On Linux, we test both the new and old behavior. The most important part about this PR is the separation of the HIP compiler and the C/C++ compiler. This allows users to choose a different C/C++ compiler if desired, compared to the current situation where when building for ROCm support, everything must be compiled with ROCm's clang. ~~Makefile is unchanged. Please let me know if we want to be consistent on variables' naming because Makefile still uses `GPU_TARGETS` to control architectures to build for, but I feel like setting `CMAKE_HIP_ARCHITECTURES` is a bit awkward when you're calling `make`.~~ Makefile used `GPU_TARGETS` but the README says to use `AMDGPU_TARGETS`. For consistency with CMake, all usage of `GPU_TARGETS` in Makefile has been updated to `AMDGPU_TARGETS`. Thanks to the suggestion of @jin-eld, to maintain backwards compatibility (and not break too many downstream users' builds), if `CMAKE_CXX_COMPILER` ends with `hipcc`, then we still compile using the original behavior and emit a warning that recommends switching to the new HIP support. Similarly, if `AMDGPU_TARGETS` is set but `CMAKE_HIP_ARCHITECTURES` is not, then we forward `AMDGPU_TARGETS` to `CMAKE_HIP_ARCHITECTURES` to ease the transition to the new HIP support. Signed-off-by: Gavin Zhao <git@gzgz.dev>	2024-05-17 17:03:03 +02:00
Max Krasnyansky	ab97fe155a	ci: fix bin/Release path for windows-arm64 builds (#7317 ) Switch to Ninja Multi-Config CMake generator to resurect bin/Release path that broke artifact packaging in CI.	2024-05-16 15:36:43 +10:00
Max Krasnyansky	5cc8a89c08	Add support for properly optimized Windows ARM64 builds with LLVM and MSVC (#7191 ) * logging: add proper checks for clang to avoid errors and warnings with VA_ARGS * build: add CMake Presets and toolchian files for Windows ARM64 * matmul-int8: enable matmul-int8 with MSVC and fix Clang warnings * ci: add support for optimized Windows ARM64 builds with MSVC and LLVM * matmul-int8: fixed typos in q8_0_q8_0 matmuls Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> * matmul-int8: remove unnecessary casts in q8_0_q8_0 --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>	2024-05-16 12:47:36 +10:00
Radoslav Gerganov	af81b28dbf	ggml : add RPC backend (#6829 ) * ggml : add RPC backend The RPC backend proxies all operations to a remote server which runs a regular backend (CPU, CUDA, Metal, etc). * set TCP_NODELAY * add CI workflows * Address review comments * fix warning * implement llama_max_devices() for RPC * Address review comments * Address review comments * wrap sockfd into a struct * implement get_alignment and get_max_size * add get_device_memory * fix warning * win32 support * add README * readme : trim trailing whitespace * Address review comments * win32 fix * Address review comments * fix compile warnings on macos	2024-05-14 14:27:19 +03:00
Neo Zhang	1fa2d4319b	[SYCL] Add oneapi runtime dll files to win release package (#7241 ) * add oneapi running time dlls to release package * fix path * fix path * fix path * fix path * fix path --------- Co-authored-by: Zhang <jianyu.zhang@intel.com>	2024-05-13 08:04:29 +08:00
Neo Zhang	b079bf29ed	[SYCL] update CI with oneapi 2024.1 (#7235 ) Co-authored-by: Zhang <jianyu.zhang@intel.com>	2024-05-13 08:02:55 +08:00
Sigbjørn Skjæret	484922ff64	Disable benchmark on forked repo (#7034 ) * Disable benchmark on forked repo * only check owner on schedule event * check owner on push also * more readable as multi-line * ternary won't work * style++ * test++ * enable actions debug * test-- * remove debug * test++ * do debug where we can get logs * test-- * this is driving me crazy * correct github.event usage * remove test condition * correct github.event usage * test++ * test-- * event_name is pull_request_target * test++ * test-- * update ref checks	2024-05-05 13:38:55 +02:00
Brian	7bd1c13a56	convert.py : add python logging instead of print() (#6511 ) * convert.py: add python logging instead of print() * convert.py: verbose flag takes priority over dump flag log suppression * convert.py: named instance logging * convert.py: use explicit logger id string * convert.py: convert extra print() to named logger * convert.py: sys.stderr.write --> logger.error * .py: Convert all python scripts to use logging module requirements.txt: remove extra line * flake8: update flake8 ignore and exclude to match ci settings * gh-actions: add flake8-no-print to flake8 lint step * pre-commit: add flake8-no-print to flake8 and also update pre-commit version * convert-hf-to-gguf.py: print() to logger conversion * .py: logging basiconfig refactor to use conditional expression .py: removed commented out logging fixup! .py: logging basiconfig refactor to use conditional expression constant.py: logger.error then exit should be a raise exception instead * .py: Convert logger error and sys.exit() into a raise exception (for atypical error) gguf-convert-endian.py: refactor convert_byteorder() to use tqdm progressbar * verify-checksum-model.py: This is the result of the program, it should be printed to stdout. * compare-llama-bench.py: add blank line for readability during missing repo response * reader.py: read_gguf_file() use print() over logging * convert.py: warning goes to stderr and won't hurt the dump output * gguf-dump.py: dump_metadata() should print to stdout * convert-hf-to-gguf.py: print --> logger.debug or ValueError() * verify-checksum-models.py: use print() for printing table * .py: refactor logging.basicConfig() gguf-py/gguf/.py: use __name__ as logger name Since they will be imported and not run directly. python-lint.yml: use .flake8 file instead * constants.py: logger no longer required * convert-hf-to-gguf.py: add additional logging * convert-hf-to-gguf.py: print() --> logger * .py: fix flake8 warnings revert changes to convert-hf-to-gguf.py for get_name() * convert-hf-to-gguf-update.py: use triple quoted f-string instead * .py: accidentally corrected the wrong line *.py: add compilade warning suggestions and style fixes	2024-05-03 22:36:41 +03:00
slaren	82e7f68958	ci : exempt confirmed bugs from being tagged as stale (#7014 )	2024-05-01 08:13:59 +03:00
Olivier Chafik	b9688eda68	build(cmake): simplify instructions (`cmake -B build && cmake --build build ...`) (#6964 ) * readme: cmake . -B build && cmake --build build * build: fix typo Co-authored-by: Jared Van Bortel <cebtenzzre@gmail.com> * build: drop implicit . from cmake config command * build: remove another superfluous . * build: update MinGW cmake commands * Update README-sycl.md Co-authored-by: Neo Zhang Jianyu <jianyu.zhang@intel.com> * build: reinstate --config Release as not the default w/ some generators + document how to build Debug * build: revert more --config Release * build: nit / remove -H from cmake example * build: reword debug instructions around single/multi config split --------- Co-authored-by: Jared Van Bortel <cebtenzzre@gmail.com> Co-authored-by: Neo Zhang Jianyu <jianyu.zhang@intel.com>	2024-04-29 17:02:45 +01:00
Georgi Gerganov	820703bf9c	llama : fix BPE pre-tokenization (#6920 ) * merged the changes from deepseeker models to main branch * Moved regex patterns to unicode.cpp and updated unicode.h * Moved header files * Resolved issues * added and refactored unicode_regex_split and related functions * Updated/merged the deepseek coder pr * Refactored code * Adding unicode regex mappings * Adding unicode regex function * Added needed functionality, testing remains * Fixed issues * Fixed issue with gpt2 regex custom preprocessor * unicode : fix? unicode_wstring_to_utf8 * lint : fix whitespaces * tests : add tokenizer tests for numbers * unicode : remove redundant headers * tests : remove and rename tokenizer test scripts * tests : add sample usage * gguf-py : reader prints warnings on duplicate keys * llama : towards llama3 tokenization support (wip) * unicode : shot in the dark to fix tests on Windows * unicode : first try custom implementations * convert : add "tokenizer.ggml.pre" GGUF KV (wip) * llama : use new pre-tokenizer type * convert : fix pre-tokenizer type writing * lint : fix * make : add test-tokenizer-0-llama-v3 * wip * models : add llama v3 vocab file * llama : adapt punctuation regex + add llama 3 regex * minor * unicode : set bomb * unicode : set bomb * unicode : always use std::wregex * unicode : support \p{N}, \p{L} and \p{P} natively * unicode : try fix windows * unicode : category support via std::regex * unicode : clean-up * unicode : simplify * convert : add convert-hf-to-gguf-update.py ggml-ci * lint : update * convert : add falcon ggml-ci * unicode : normalize signatures * lint : fix * lint : fix * convert : remove unused functions * convert : add comments * convert : exercise contractions ggml-ci * lint : fix * cmake : refactor test targets * tests : refactor vocab tests ggml-ci * tests : add more vocabs and tests ggml-ci * unicode : cleanup * scripts : ignore new update script in check-requirements.sh * models : add phi-3, mpt, gpt-2, starcoder * tests : disable obsolete ggml-ci * tests : use faster bpe test ggml-ci * llama : more prominent warning for old BPE models * tests : disable test-tokenizer-1-bpe due to slowness ggml-ci --------- Co-authored-by: Jaggzh <jaggz.h@gmail.com> Co-authored-by: Kazim Abrar Mahi <kazimabrarmahi135@gmail.com>	2024-04-29 16:58:41 +03:00
Przemysław Pawełczyk	2307a7b21c	ci : add building in MSYS2 environments (Windows) (#6967 )	2024-04-29 15:59:47 +03:00
Pierrick Hymbert	6feab329fe	ci: server: tests python env on github container ubuntu latest / fix n_predict (#6935 ) * ci: server: fix python env * ci: server: fix server tests after #6638 * ci: server: fix windows is not building PR branch	2024-04-27 17:50:48 +02:00
Pierrick Hymbert	e5ef23a472	ci: server: fix python installation (#6925 )	2024-04-26 12:27:25 +02:00
Pierrick Hymbert	bdd8ba3806	ci: server: fix python installation (#6922 )	2024-04-26 11:11:51 +02:00
Pierrick Hymbert	a1cc26069f	ci: server: fix python installation (#6918 )	2024-04-26 09:27:49 +02:00
Pierrick Hymbert	f8c07fa19a	ci: fix concurrency for pull_request_target (#6917 )	2024-04-26 09:26:59 +02:00
Pierrick Hymbert	563887f3b9	ci: fix job are cancelling each other (#6781 )	2024-04-22 13:22:54 +02:00
loonerin	ba48614c84	ci: add ubuntu latest release and fix missing build number (mac & ubuntu) (#6748 )	2024-04-19 19:03:35 +02:00

1 2 3 4

193 Commits