安装教程#

AngelSlim支持如下安装方式:

pip安装(推荐)#

默认安装(LLM)#

通过pip安装最新AngelSlim稳定发布版:

pip install angelslim

如果已经安装AngelSlim,通过下面的指令强制获取最新更新:

pip install --upgrade --force-reinstall --no-cache-dir angelslim

投机采样安装#

pip install angelslim[speculative]

多模态安装#

pip install angelslim[multimodal]

Diffusion安装#

pip install angelslim[diffusion]

全部安装#

pip install angelslim[all]

备注

  • 如果pip安装失败,请检查联网是否正确,并更新pip:pip install --upgrade pip

  • CUDA工具包: 可以参考CUDA Toolkit 安装文档安装所需要的版本;

  • 与CUDA驱动程序的PyTorch版本:AngelSlim正确运行需要torch>=2.4.1,可以根据安装的 CUDA 驱动程序版本安装对应的PyTorch 最新版本,或者所需要的其他 PyTorch 版本

编译安装#

如果对工具代码做过改动,或者想使用main分支最新功能,推荐使用编译安装方式:

cd AngelSlim
python setup.py install

指定环境变量#

如果对源码做了修改,更简易的方式是指定PYTHONPATH环境变量,例如:

export PYTHONPATH=Your/Path/to/AngelSlim/:$PYTHONPATH

备注

指定环境变量后,需要和执行压缩算法的脚本在同一终端执行,比如放在同一个shell脚本内,先export PYTHONPATH环境变量,然后运行压缩程序代码。

Windows Installation (with FP8 Triton Support)#

AngelSlim supports Windows with FP8 Triton kernels. Follow these steps to build from source:

:: Clone the repository
git clone https://github.com/Tencent/AngelSlim.git
cd AngelSlim

:: Create and activate virtual environment (Python 3.10 recommended)
uv venv --python 3.10
.venv\Scripts\activate

:: Install base dependencies
uv pip install packaging wheel setuptools ninja numpy==1.26.4 pip build psutil

:: Install PyTorch with CUDA 12.8 support
uv pip install torch==2.10.0 --index-url https://download.pytorch.org/whl/cu128

:: Install Triton for Windows
uv pip install -U triton-windows

:: Configure Visual Studio build environment
set INCLUDE=
set LIB=
set LIBPATH=
call "C:\Program Files\Microsoft Visual Studio\2022\Community\VC\Auxiliary\Build\vcvarsall.bat" x64

:: Configure CUDA environment
set CUDA_HOME=C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.8
set PATH=%CUDA_HOME%\bin;%PATH%
set DISTUTILS_USE_SDK=1

:: Set target CUDA architectures (adjust based on your GPU)
set TORCH_CUDA_ARCH_LIST=8.0;8.6;8.9;9.0

:: Build the wheel
set DG_USE_LOCAL_VERSION=0
python setup.py bdist_wheel

:: Verify FP8 Triton kernels are working
python -c "import torch; from angelslim.compressor.diffusion.kernels.python.quantizers import fp8_per_block_quant_triton; from angelslim.compressor.diffusion.kernels.python.gemm import fp8_gemm_triton_block; a,b=torch.randn(128,256,device='cuda'),torch.randn(512,256,device='cuda'); aq,a_s=fp8_per_block_quant_triton(a); bq,b_s=fp8_per_block_quant_triton(b); c=fp8_gemm_triton_block(aq,a_s,bq,b_s); print(f'FP8 GEMM OK: {c.shape}, {c.dtype}')"

Requirements:

  • Windows 10/11 with NVIDIA GPU (Ampere or newer recommended)

  • Visual Studio 2022 with C++ build tools

  • CUDA Toolkit 12.8

  • Python 3.10

Environment Variables:

  • ANGELSLIM_BACKEND: Force backend selection (triton or pytorch)

  • ANGELSLIM_TORCH_COMPILE: Enable/disable torch.compile (0 or 1)