安装教程

安装教程#

AngelSlim支持如下安装方式：

pip安装（推荐）
编译安装
指定环境变量

pip安装（推荐）#

默认安装（LLM）#

通过pip安装最新AngelSlim稳定发布版：

pip install angelslim

如果已经安装AngelSlim，通过下面的指令强制获取最新更新：

pip install --upgrade --force-reinstall --no-cache-dir angelslim

投机采样安装#

pip install angelslim[speculative]

多模态安装#

pip install angelslim[multimodal]

Diffusion安装#

pip install angelslim[diffusion]

全部安装#

pip install angelslim[all]

备注

如果pip安装失败，请检查联网是否正确，并更新pip：pip install --upgrade pip
CUDA工具包: 可以参考CUDA Toolkit 安装文档安装所需要的版本；
与CUDA驱动程序的PyTorch版本：AngelSlim正确运行需要torch>=2.4.1，可以根据安装的 CUDA 驱动程序版本安装对应的PyTorch 最新版本，或者所需要的其他 PyTorch 版本。

编译安装#

如果对工具代码做过改动，或者想使用main分支最新功能，推荐使用编译安装方式：

cd AngelSlim
python setup.py install

指定环境变量#

如果对源码做了修改，更简易的方式是指定PYTHONPATH环境变量，例如：

export PYTHONPATH=Your/Path/to/AngelSlim/:$PYTHONPATH

备注

指定环境变量后，需要和执行压缩算法的脚本在同一终端执行，比如放在同一个shell脚本内，先export PYTHONPATH环境变量，然后运行压缩程序代码。

Windows Installation (with FP8 Triton Support)#

AngelSlim supports Windows with FP8 Triton kernels. Follow these steps to build from source:

:: Clone the repository
git clone https://github.com/Tencent/AngelSlim.git
cd AngelSlim

:: Create and activate virtual environment (Python 3.10 recommended)
uv venv --python 3.10
.venv\Scripts\activate

:: Install base dependencies
uv pip install packaging wheel setuptools ninja numpy==1.26.4 pip build psutil

:: Install PyTorch with CUDA 12.8 support
uv pip install torch==2.10.0 --index-url https://download.pytorch.org/whl/cu128

:: Install Triton for Windows
uv pip install -U triton-windows

:: Configure Visual Studio build environment
set INCLUDE=
set LIB=
set LIBPATH=
call "C:\Program Files\Microsoft Visual Studio\2022\Community\VC\Auxiliary\Build\vcvarsall.bat" x64

:: Configure CUDA environment
set CUDA_HOME=C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.8
set PATH=%CUDA_HOME%\bin;%PATH%
set DISTUTILS_USE_SDK=1

:: Set target CUDA architectures (adjust based on your GPU)
set TORCH_CUDA_ARCH_LIST=8.0;8.6;8.9;9.0

:: Build the wheel
set DG_USE_LOCAL_VERSION=0
python setup.py bdist_wheel

:: Verify FP8 Triton kernels are working
python -c "import torch; from angelslim.compressor.diffusion.kernels.python.quantizers import fp8_per_block_quant_triton; from angelslim.compressor.diffusion.kernels.python.gemm import fp8_gemm_triton_block; a,b=torch.randn(128,256,device='cuda'),torch.randn(512,256,device='cuda'); aq,a_s=fp8_per_block_quant_triton(a); bq,b_s=fp8_per_block_quant_triton(b); c=fp8_gemm_triton_block(aq,a_s,bq,b_s); print(f'FP8 GEMM OK: {c.shape}, {c.dtype}')"

Requirements:

Windows 10/11 with NVIDIA GPU (Ampere or newer recommended)
Visual Studio 2022 with C++ build tools
CUDA Toolkit 12.8
Python 3.10

Environment Variables:

ANGELSLIM_BACKEND: Force backend selection (triton or pytorch)
ANGELSLIM_TORCH_COMPILE: Enable/disable torch.compile (0 or 1)