Quantization-Aware Distillation

Quantization-Aware Distillation#

QAD trains a quantized student with an independent full-precision teacher. It is the bridge between the QAT and Distill paths: QAD reuses QAT quantization modules, learnable-scale plugins, conversion, and save logic, while sharing the Distill trainer and distillation losses.

Use Distill for full-precision students. Use QAD when the student should be quantized during training.

Features#

Build the student with the same quantization configuration used by QAT.
Load an independent teacher from compression.QAD.teacher_model_path.
Train all student parameters or only quantization-related parameters with trainable_parameters.
Reuse QAT plugin configuration, including learnable weight, activation, KV, norm, and LWC parameters.
Support the same supervised and distillation loss composition as Distill.
Save quantized outputs through QAT save formats such as fake, real, and real_and_kvcache.

W4A8-FP8 Example#

This example distills a W4A8-FP8 Qwen3-4B student from a full-precision Qwen3-4B teacher and trains only quantization parameters.

torchrun --nproc_per_node=8 \
  tools/run.py \
  -c configs/qwen3/qad/w4a8_fp8/qwen3-4b_w4a8_fp8_qad_zero2.yaml

Key fields:

compression:
  name: QAD
  quantization:
    name: w4a8_fp8
  QAD:
    teacher_model_path: Qwen/Qwen3-4B
    student_type: quantized
    trainable_parameters: quant
    save_format: real
    plugin_config:
      enable_scale: true

Special Weight Quantizers#

The special weight quantizer path keeps the standard QuantLinear wrapper and switches only the weight quantizer implementation through config. The Qwen3 examples are:

configs/qwen3/qad/special/qwen3-1_7b_sherry_qad_from_qwen3-4b_zero2.yaml
configs/qwen3/qad/special/qwen3-1_7b_absmean_qad_from_qwen3-4b_zero2.yaml
configs/qwen3/qad/special/qwen3-1_7b_twn_qad_from_qwen3-4b_zero2.yaml
configs/qwen3/qad/special/qwen3-1_7b_lsq_qad_from_qwen3-4b_zero2.yaml
configs/qwen3/qad/special/qwen3-1_7b_seq_qad_from_qwen3-4b_zero2.yaml
configs/qwen3/qad/special/qwen3-1_7b_dlt_qad_from_qwen3-4b_zero2.yaml

Run one method by selecting its config:

torchrun --nproc_per_node=8 \
  tools/run.py \
  -c configs/qwen3/qad/special/qwen3-1_7b_sherry_qad_from_qwen3-4b_zero2.yaml