* fix: Updating cleanup step
* fix: Updated trigger for build-container
* fix: Unset token for cleanup step
* fix: Set build-container cleanup step without run-dry
* fix: Removed 100 commits from checkout actions
* fix: Enable the whole history
* test: Suggestion from mcm007 on build-container
* fix: add token to package cleanup step
Explicitly pass GITHUB_TOKEN to the delete-package-versions action to ensure it has sufficient authorization for package deletion.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* fix: - Error: delete version API failed. Package not found.
* fix: Deleted LLAMA_COMMIT from build
* fix: Removed id-token permission
---------
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
* ci: implement build matrix for CUDA/CPU containers with dynamic tagging
* fix: Updated Docker images/build-container.yml
* fix: Updated the documentation about Docker
* fix: Set Arch for 3090s
* fix: Updated build step name.
* fix: Set target ARCH as a variable
* feat: Added cleanup step
* feat: Added docker-bake and updated workflow
* fix: Issue with REPO_OWNER variable
* fix: Updated workflow to solve errors
* fix: Updated branch format
* fix: Wrong naming
* Update docker-bake.hcl
* Update build-container.yml
* Update ik_llama-cuda.Containerfile
* Update ik_llama-cpu.Containerfile
* Update docker-bake.hcl
* Update build-container.yml
* Removed action/cache
* added -sSL for reliability and fixed the URL path
* added -sSL for reliability and fixed the URL path CUDA containerfile
* fix: correct Dockerfile RUN command syntax errors
- Combine split apt-get install commands in both Containerfiles
- Fix broken cmake command continuation in ik_llama-cuda.Containerfile
* fix: correct llama-swap download URL in Containerfiles
- Fix broken line continuation in curl download URL for llama-swap
* perf: improve ccache configuration in Containerfiles
- Add CCACHE_UMASK=000 for cache accessibility across stages
- Add CCACHE_MAXSIZE=1G to prevent unbounded growth
- Initialize ccache with ccache -i during build stage
* fix: remove problematic ccache initialization from Containerfiles
- ccache -i fails because CCACHE_DIR mount doesn't exist yet at build time
* fix: add git to CPU Containerfile build dependencies
- Resolves CMake warning about missing Git for build info
* chore: optimize Containerfile with smaller images and better healthchecks
- Add --no-install-recommends to all apt-get commands for smaller image size
- Add ca-certificates to base stage for HTTPS support
- Merge redundant build copy commands from 3 layers to 1
- Fix llama-swap version from 198 to v199 (latest release)
- Add HEALTHCHECK configuration with interval/timeout/retries to server and swap stages
- Copy /app/lib in server stage to fix container startup
* chore: fix CUDA Containerfile healthchecks and swap version
- Add /app/lib copy in server stage to fix container startup
- Fix llama-swap version from 198 to v199 (latest release)
- Add HEALTHCHECK configuration with interval/timeout/retries
* chore: fix indentation in Containerfiles and add LD_LIBRARY_PATH for server target
* fix: add --break-system-packages flag for pip in CPU Containerfile
* feat: add git bind mount for build info and NCCL support for CUDA
* fix: remove libnccl-dev from CUDA build (already included in base image)
* fix: added Markdown files to ignore files
* feat: use BUILD_NUMBER-COMMIT pattern for docker image tags
- Add BUILD_NUMBER and LLAMA_COMMIT to build workflow
- Update docker-bake.hcl to use version tag format matching llama-server --version output
- Format: VARIANT-BUILD_NUMBER-COMMIT (e.g., cu12-full-4406-3bc90dfd)
* fix: fetch full git history for accurate BUILD_NUMBER
- Add fetch-depth: 0 to actions/checkout to get all commits
- This ensures git rev-list --count HEAD returns correct total commit count
* fix: fetch full git history in Dockerfile for accurate BUILD_NUMBER
- Add git fetch --unshallow to get complete commit history during build
- This ensures build-info.cpp is generated with correct LLAMA_BUILD_NUMBER
* chore: update GitHub Actions to latest versions for Node.js 24 compatibility
- docker/setup-buildx-action@v3 -> v4
- docker/login-action@v3 -> v4
* chore: update all GitHub Actions to Node.js 24 compatible versions
- actions/checkout@v4 -> v6
- docker/setup-buildx-action@v3 -> v5
- docker/login-action@v3 -> v6
- docker/bake-action@v5 -> v7
* fix: use CI-passed BUILD_NUMBER and LLAMA_COMMIT in Dockerfile
- Add BUILD_NUMBER and LLAMA_COMMIT as build args
- Fall back to git commands if not provided
- Pass values explicitly to cmake for accurate build info
* fix: pass BUILD_NUMBER and LLAMA_COMMIT as Docker build args
- Add BUILD_NUMBER and LLAMA_COMMIT to docker bake args
- These will be used by the Containerfile for accurate build info
* fix: revert docker actions to v4 (latest available versions)
* fix: calculate BUILD_NUMBER and LLAMA_COMMIT directly in Containerfile
- Removed ARG defaults since we calculate from git during build
- Use git rev-list --count HEAD and git rev-parse for accurate version info
- Falls back to 0/unknown if git commands fail
* feat: calculate BUILD_NUMBER and LLAMA_COMMIT in Containerfiles
- Add git-based version calculation in both CPU and CUDA Containerfiles
- Remove .git bind mount (git is copied with COPY .)
- Pass build info to CMake for accurate llama-server --version output
* feat: calculate BUILD_NUMBER and LLAMA_COMMIT in Containerfiles
- Add git-based version calculation using git rev-list and git rev-parse
- Copy .git directory separately to ensure git commands work during build
- Pass build info to CMake for accurate llama-server --version output
* fix: cache improvements for CUDA and CPU builds
* fix: "/.git": not found
* fix: Unnecessary mv llama-swap
* fix: Remove BUILD_NUMBER and LLAMA_COMMIT from docker file, calculated by cmake proc
* fix: remove .git from dockerignore for local and CI builds
- Enables cmake to access .git directory during Docker build
- Required for version calculation in llama-server binary
- GitHub Actions uses explicit mount via bake action set parameter
* fix: Remove mounts key from Build and Push step in gh workflow
* ci: add .git verification step before build
* refactor: standardize Containerfile structure and remove .git mount dependency
- Remove --mount=type=bind,source=.git,target=.git from both Containerfiles
- Replace COPY . . with git clone for cleaner build context
- Add CUSTOM_COMMIT ARG for optional custom commit switching
- Standardize ARG/ENV ordering and comment formatting across CPU/CUDA variants
- Install ca-certificates before git clone to fix SSL verification issues
- Rename 'Structured artifact collection' to 'Collect build artifacts'
* ci: remove broken cache pruning step
* ci: remove broken prune-cache job
- Remove prune-cache job that was failing due to missing .git directory
- The job required a checkout step and the cache pruning logic was non-critical
* chore: Removed step for Verifying .git existance in GH workflow
* fix: ensure build always proceeds even if git switch fails
- Add '|| true' to git switch command so build continues on failure
- This prevents the entire RUN step from failing when CUSTOM_COMMIT is invalid
* fix: resolve Docker build pipeline issues
- Remove external git clone from Containerfiles, use build context directly
- Add BUILD_NUMBER and BUILD_COMMIT as CMake cache variables in build-info.cmake
- Fix .devops/tools.sh inclusion by using explicit COPY for hidden directories
- Set USE_CCACHE=true for CI builds
- Clean up unused SHA_SHORT variable from docker-bake.hcl
Fixes: Build steps were cached incorrectly due to external git clone ignoring the actual build context source.
* fix: include .git in Docker build context and add verification
* ci: add .git directory verification step after checkout
* build: fix .git mount path for Docker build context compatibility
* build: fix .git mount path for Docker build context compatibility
* docker: include .git in build context for version calculation
* ci: add .git directory verification step after checkout
* chore: Removed unecessary Verify .git step (It was a test)
* docs: update README with docker-bake and build-local.sh instructions
* docs: remove build-local.sh reference (not in repo)
* ci: optimize disk usage by limiting fetch depth and cleaning workspace
* fix: cleanup step in workflow
---------
Co-authored-by: HP Prodesk <sourceupdev@gmail.com>
* ci: implement build matrix for CUDA/CPU containers with dynamic tagging
* fix: Updated Docker images/build-container.yml
* fix: Updated the documentation about Docker
* fix: Set Arch for 3090s
* fix: Updated build step name.
* fix: Set target ARCH as a variable
* feat: Added cleanup step
* feat: Added docker-bake and updated workflow
* fix: Issue with REPO_OWNER variable
* fix: Updated workflow to solve errors
* fix: Updated branch format
* fix: Wrong naming
* Update docker-bake.hcl
* Update build-container.yml
* Update ik_llama-cuda.Containerfile
* Update ik_llama-cpu.Containerfile
* Update docker-bake.hcl
* Update build-container.yml
* Removed action/cache
* added -sSL for reliability and fixed the URL path
* added -sSL for reliability and fixed the URL path CUDA containerfile
* fix: correct Dockerfile RUN command syntax errors
- Combine split apt-get install commands in both Containerfiles
- Fix broken cmake command continuation in ik_llama-cuda.Containerfile
* fix: correct llama-swap download URL in Containerfiles
- Fix broken line continuation in curl download URL for llama-swap
* perf: improve ccache configuration in Containerfiles
- Add CCACHE_UMASK=000 for cache accessibility across stages
- Add CCACHE_MAXSIZE=1G to prevent unbounded growth
- Initialize ccache with ccache -i during build stage
* fix: remove problematic ccache initialization from Containerfiles
- ccache -i fails because CCACHE_DIR mount doesn't exist yet at build time
* fix: add git to CPU Containerfile build dependencies
- Resolves CMake warning about missing Git for build info
* chore: optimize Containerfile with smaller images and better healthchecks
- Add --no-install-recommends to all apt-get commands for smaller image size
- Add ca-certificates to base stage for HTTPS support
- Merge redundant build copy commands from 3 layers to 1
- Fix llama-swap version from 198 to v199 (latest release)
- Add HEALTHCHECK configuration with interval/timeout/retries to server and swap stages
- Copy /app/lib in server stage to fix container startup
* chore: fix CUDA Containerfile healthchecks and swap version
- Add /app/lib copy in server stage to fix container startup
- Fix llama-swap version from 198 to v199 (latest release)
- Add HEALTHCHECK configuration with interval/timeout/retries
* chore: fix indentation in Containerfiles and add LD_LIBRARY_PATH for server target
* fix: add --break-system-packages flag for pip in CPU Containerfile
* feat: add git bind mount for build info and NCCL support for CUDA
* fix: remove libnccl-dev from CUDA build (already included in base image)
* fix: added Markdown files to ignore files
* feat: use BUILD_NUMBER-COMMIT pattern for docker image tags
- Add BUILD_NUMBER and LLAMA_COMMIT to build workflow
- Update docker-bake.hcl to use version tag format matching llama-server --version output
- Format: VARIANT-BUILD_NUMBER-COMMIT (e.g., cu12-full-4406-3bc90dfd)
* fix: fetch full git history for accurate BUILD_NUMBER
- Add fetch-depth: 0 to actions/checkout to get all commits
- This ensures git rev-list --count HEAD returns correct total commit count
* fix: fetch full git history in Dockerfile for accurate BUILD_NUMBER
- Add git fetch --unshallow to get complete commit history during build
- This ensures build-info.cpp is generated with correct LLAMA_BUILD_NUMBER
* chore: update GitHub Actions to latest versions for Node.js 24 compatibility
- docker/setup-buildx-action@v3 -> v4
- docker/login-action@v3 -> v4
* chore: update all GitHub Actions to Node.js 24 compatible versions
- actions/checkout@v4 -> v6
- docker/setup-buildx-action@v3 -> v5
- docker/login-action@v3 -> v6
- docker/bake-action@v5 -> v7
* fix: use CI-passed BUILD_NUMBER and LLAMA_COMMIT in Dockerfile
- Add BUILD_NUMBER and LLAMA_COMMIT as build args
- Fall back to git commands if not provided
- Pass values explicitly to cmake for accurate build info
* fix: pass BUILD_NUMBER and LLAMA_COMMIT as Docker build args
- Add BUILD_NUMBER and LLAMA_COMMIT to docker bake args
- These will be used by the Containerfile for accurate build info
* fix: revert docker actions to v4 (latest available versions)
* fix: calculate BUILD_NUMBER and LLAMA_COMMIT directly in Containerfile
- Removed ARG defaults since we calculate from git during build
- Use git rev-list --count HEAD and git rev-parse for accurate version info
- Falls back to 0/unknown if git commands fail
* feat: calculate BUILD_NUMBER and LLAMA_COMMIT in Containerfiles
- Add git-based version calculation in both CPU and CUDA Containerfiles
- Remove .git bind mount (git is copied with COPY .)
- Pass build info to CMake for accurate llama-server --version output
* feat: calculate BUILD_NUMBER and LLAMA_COMMIT in Containerfiles
- Add git-based version calculation using git rev-list and git rev-parse
- Copy .git directory separately to ensure git commands work during build
- Pass build info to CMake for accurate llama-server --version output
* fix: cache improvements for CUDA and CPU builds
* fix: "/.git": not found
* fix: Unnecessary mv llama-swap
* fix: Remove BUILD_NUMBER and LLAMA_COMMIT from docker file, calculated by cmake proc
* fix: remove .git from dockerignore for local and CI builds
- Enables cmake to access .git directory during Docker build
- Required for version calculation in llama-server binary
- GitHub Actions uses explicit mount via bake action set parameter
* fix: Remove mounts key from Build and Push step in gh workflow
* ci: add .git verification step before build
* refactor: standardize Containerfile structure and remove .git mount dependency
- Remove --mount=type=bind,source=.git,target=.git from both Containerfiles
- Replace COPY . . with git clone for cleaner build context
- Add CUSTOM_COMMIT ARG for optional custom commit switching
- Standardize ARG/ENV ordering and comment formatting across CPU/CUDA variants
- Install ca-certificates before git clone to fix SSL verification issues
- Rename 'Structured artifact collection' to 'Collect build artifacts'
* ci: remove broken cache pruning step
* ci: remove broken prune-cache job
- Remove prune-cache job that was failing due to missing .git directory
- The job required a checkout step and the cache pruning logic was non-critical
* chore: Removed step for Verifying .git existance in GH workflow
* fix: ensure build always proceeds even if git switch fails
- Add '|| true' to git switch command so build continues on failure
- This prevents the entire RUN step from failing when CUSTOM_COMMIT is invalid
* fix: resolve Docker build pipeline issues
- Remove external git clone from Containerfiles, use build context directly
- Add BUILD_NUMBER and BUILD_COMMIT as CMake cache variables in build-info.cmake
- Fix .devops/tools.sh inclusion by using explicit COPY for hidden directories
- Set USE_CCACHE=true for CI builds
- Clean up unused SHA_SHORT variable from docker-bake.hcl
Fixes: Build steps were cached incorrectly due to external git clone ignoring the actual build context source.
* fix: include .git in Docker build context and add verification
* ci: add .git directory verification step after checkout
* build: fix .git mount path for Docker build context compatibility
* build: fix .git mount path for Docker build context compatibility
* docker: include .git in build context for version calculation
* ci: add .git directory verification step after checkout
* chore: Removed unecessary Verify .git step (It was a test)
* docs: update README with docker-bake and build-local.sh instructions
* docs: remove build-local.sh reference (not in repo)
* ci: optimize disk usage by limiting fetch depth and cleaning workspace
---------
Co-authored-by: HP Prodesk <sourceupdev@gmail.com>
* Merging mainline - WIP
* Merging mainline - WIP
AVX2 and CUDA appear to work.
CUDA performance seems slightly (~1-2%) lower as it is so often
the case with llama.cpp/ggml after some "improvements" have been made.
* Merging mainline - fix Metal
* Remove check
---------
Co-authored-by: Iwan Kawrakow <iwan.kawrakow@gmail.com>
* try to fix CUDA ci with --allow-unsupported-compiler
* trigger when build.yml changes
* another test
* try exllama/bdashore3 method
* install vs build tools before cuda toolkit
* try win-2019
This commit adds pull_request_template.md and CONTRIBUTING.md . It focuses on explaining to contributors the need to rate PR complexity level, when to add [no ci] and how to format PR title and descriptions.
Co-authored-by: Brian <mofosyne@gmail.com>
Co-authored-by: compilade <git@compilade.net>
* ggml: Added OpenMP for multi-threads processing
* ggml : Limit the number of threads used to avoid deadlock
* update shared state n_threads in parallel region
* clear numa affinity for main thread even with openmp
* enable openmp by default
* fix msvc build
* disable openmp on macos
* ci : disable openmp with thread sanitizer
* Update ggml.c
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
---------
Co-authored-by: slaren <slarengh@gmail.com>
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
Supercedes #4024 and #4813.
CMake's native HIP support has become the
recommended way to add HIP code into a project (see
[here](https://rocm.docs.amd.com/en/docs-6.0.0/conceptual/cmake-packages.html#using-hip-in-cmake)).
This PR makes the following changes:
1. The environment variable `HIPCXX` or CMake option
`CMAKE_HIP_COMPILER` should be used to specify the HIP
compiler. Notably this shouldn't be `hipcc`, but ROCm's clang,
which usually resides in `$ROCM_PATH/llvm/bin/clang`. Previously
this was control by `CMAKE_C_COMPILER` and `CMAKE_CXX_COMPILER`.
Note that since native CMake HIP support is not yet available on
Windows, on Windows we fall back to the old behavior.
2. CMake option `CMAKE_HIP_ARCHITECTURES` is used to control the
GPU architectures to build for. Previously this was controled by
`GPU_TARGETS`.
3. Updated the Nix recipe to account for these new changes.
4. The GPU targets to build against in the Nix recipe is now
consistent with the supported GPU targets in nixpkgs.
5. Added CI checks for HIP on both Linux and Windows. On Linux, we test
both the new and old behavior.
The most important part about this PR is the separation of the
HIP compiler and the C/C++ compiler. This allows users to choose
a different C/C++ compiler if desired, compared to the current
situation where when building for ROCm support, everything must be
compiled with ROCm's clang.
~~Makefile is unchanged. Please let me know if we want to be
consistent on variables' naming because Makefile still uses
`GPU_TARGETS` to control architectures to build for, but I feel
like setting `CMAKE_HIP_ARCHITECTURES` is a bit awkward when you're
calling `make`.~~ Makefile used `GPU_TARGETS` but the README says
to use `AMDGPU_TARGETS`. For consistency with CMake, all usage of
`GPU_TARGETS` in Makefile has been updated to `AMDGPU_TARGETS`.
Thanks to the suggestion of @jin-eld, to maintain backwards
compatibility (and not break too many downstream users' builds), if
`CMAKE_CXX_COMPILER` ends with `hipcc`, then we still compile using
the original behavior and emit a warning that recommends switching
to the new HIP support. Similarly, if `AMDGPU_TARGETS` is set but
`CMAKE_HIP_ARCHITECTURES` is not, then we forward `AMDGPU_TARGETS`
to `CMAKE_HIP_ARCHITECTURES` to ease the transition to the new
HIP support.
Signed-off-by: Gavin Zhao <git@gzgz.dev>
* logging: add proper checks for clang to avoid errors and warnings with VA_ARGS
* build: add CMake Presets and toolchian files for Windows ARM64
* matmul-int8: enable matmul-int8 with MSVC and fix Clang warnings
* ci: add support for optimized Windows ARM64 builds with MSVC and LLVM
* matmul-int8: fixed typos in q8_0_q8_0 matmuls
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
* matmul-int8: remove unnecessary casts in q8_0_q8_0
---------
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
* Disable benchmark on forked repo
* only check owner on schedule event
* check owner on push also
* more readable as multi-line
* ternary won't work
* style++
* test++
* enable actions debug
* test--
* remove debug
* test++
* do debug where we can get logs
* test--
* this is driving me crazy
* correct github.event usage
* remove test condition
* correct github.event usage
* test++
* test--
* event_name is pull_request_target
* test++
* test--
* update ref checks
* convert.py: add python logging instead of print()
* convert.py: verbose flag takes priority over dump flag log suppression
* convert.py: named instance logging
* convert.py: use explicit logger id string
* convert.py: convert extra print() to named logger
* convert.py: sys.stderr.write --> logger.error
* *.py: Convert all python scripts to use logging module
* requirements.txt: remove extra line
* flake8: update flake8 ignore and exclude to match ci settings
* gh-actions: add flake8-no-print to flake8 lint step
* pre-commit: add flake8-no-print to flake8 and also update pre-commit version
* convert-hf-to-gguf.py: print() to logger conversion
* *.py: logging basiconfig refactor to use conditional expression
* *.py: removed commented out logging
* fixup! *.py: logging basiconfig refactor to use conditional expression
* constant.py: logger.error then exit should be a raise exception instead
* *.py: Convert logger error and sys.exit() into a raise exception (for atypical error)
* gguf-convert-endian.py: refactor convert_byteorder() to use tqdm progressbar
* verify-checksum-model.py: This is the result of the program, it should be printed to stdout.
* compare-llama-bench.py: add blank line for readability during missing repo response
* reader.py: read_gguf_file() use print() over logging
* convert.py: warning goes to stderr and won't hurt the dump output
* gguf-dump.py: dump_metadata() should print to stdout
* convert-hf-to-gguf.py: print --> logger.debug or ValueError()
* verify-checksum-models.py: use print() for printing table
* *.py: refactor logging.basicConfig()
* gguf-py/gguf/*.py: use __name__ as logger name
Since they will be imported and not run directly.
* python-lint.yml: use .flake8 file instead
* constants.py: logger no longer required
* convert-hf-to-gguf.py: add additional logging
* convert-hf-to-gguf.py: print() --> logger
* *.py: fix flake8 warnings
* revert changes to convert-hf-to-gguf.py for get_name()
* convert-hf-to-gguf-update.py: use triple quoted f-string instead
* *.py: accidentally corrected the wrong line
* *.py: add compilade warning suggestions and style fixes