量化benchmark#
Hunyuan-Instruct#
Hunyuan-Instruct的BF16、FP8、INT4-GPTQ、INT4-AWQ在OlympiadBench、AIME 2024、DROP、GPQA-Diamond上的评测结果如下:
Model |
Quantization |
CEVAL |
MMLU |
GSM8K |
HUMANEVAL |
|---|---|---|---|---|---|
Hunyuan-A13B-Instruct |
BF16 |
82.70 |
87.30 |
91.10 |
71.20 |
FP8-Static |
83.00 |
86.70 |
91.10 |
||
INT4-GPTQ |
82.70 |
86.70 |
91.10 |
||
INT4-AWQ |
82.60 |
85.60 |
91.00 |
||
Hunyuan-7B-Instruct |
BF16 |
76.50 |
81.10 |
85.90 |
60.10 |
FP8-Static |
76.60 |
80.90 |
86.00 |
60.10 |
|
INT4-GPTQ |
76.20 |
81.00 |
85.70 |
60.00 |
|
INT4-AWQ |
76.40 |
80.90 |
85.90 |
60.10 |
|
Hunyuan-4B-Instruct |
BF16 |
73.10 |
78.30 |
78.20 |
61.10 |
FP8-Static |
73.10 |
76.60 |
78.30 |
60.20 |
|
INT4-GPTQ |
72.90 |
78.10 |
58.10 |
||
INT4-AWQ |
72.80 |
78.20 |
|||
Hunyuan-1.8B-Instruct |
BF16 |
63.40 |
56.70 |
76.70 |
47.20 |
FP8-Static |
62.50 |
55.20 |
75.10 |
47.70 |
|
INT4-GPTQ |
60.90 |
73.00 |
44.40 |
||
INT4-AWQ |
61.70 |
71.70 |
43.60 |
||
Hunyuan-0.5B-Instruct |
BF16 |
29.60 |
17.20 |
52.80 |
23.30 |
FP8-Static |
29.60 |
17.20 |
51.60 |
22.50 |
|
INT4-GPTQ |
26.80 |
50.90 |
23.30 |
||
INT4-AWQ |
26.30 |
48.90 |
23.30 |
Qwen3#
Qwen3系列模型的BF16、FP8-Static、FP8-Dynamic、INT4-GPTQ、INT4-AWQ在CEVAL、MMLU、GSM8K、HUMANEVAL上的评测结果如下:
Model |
Quantization |
CEVAL |
MMLU |
GSM8K |
HUMANEVAL |
|---|---|---|---|---|---|
Qwen3-0.6B |
BF16 |
45.84 |
47.21 |
42.99 |
19.51 |
FP8-Static |
45.99 |
46.87 |
38.06 |
18.90 |
|
FP8-Dynamic |
45.99 |
46.93 |
38.29 |
20.73 |
|
INT8-Dynamic |
45.17 |
46.95 |
41.17 |
21.34 |
|
Qwen3-1.7B |
BF16 |
60.33 |
59.77 |
68.69 |
40.85 |
FP8-Static |
61.07 |
59.39 |
68.01 |
38.41 |
|
FP8-Dynamic |
60.77 |
59.88 |
67.10 |
34.76 |
|
INT8-Dynamic |
60.25 |
59.80 |
68.54 |
41.46 |
|
INT4-GPTQ |
57.50 |
56.93 |
|||
INT4-AWQ |
59.06 |
56.86 |
|||
Qwen3-4B |
BF16 |
72.66 |
69.99 |
85.37 |
72.56 |
FP8-Static |
72.14 |
69.93 |
83.70 |
73.17 |
|
FP8-Dynamic |
70.80 |
70.08 |
83.40 |
69.51 |
|
INT8-Dynamic |
72.21 |
69.47 |
85.75 |
66.46 |
|
INT4-GPTQ |
70.06 |
68.59 |
81.65 |
||
INT4-AWQ |
70.36 |
67.62 |
80.59 |
||
Qwen3-8B |
BF16 |
79.27 |
74.78 |
87.79 |
63.41 |
FP8-Static |
78.23 |
74.79 |
86.96 |
62.20 |
|
FP8-Dynamic |
78.45 |
74.75 |
87.64 |
62.80 |
|
INT8-Dynamic |
78.01 |
74.84 |
86.96 |
67.07 |
|
INT4-GPTQ |
77.19 |
73.26 |
86.43 |
62.20 |
|
INT4-AWQ |
76.15 |
73.59 |
86.96 |
63.41 |
|
Qwen3-14B |
BF16 |
83.06 |
78.90 |
88.40 |
55.49 |
FP8-Static |
82.62 |
78.57 |
89.46 |
57.32 |
|
FP8-Dynamic |
82.24 |
78.92 |
88.32 |
52.44 |
|
INT8-Dynamic |
81.87 |
78.13 |
86.28 |
56.10 |
|
INT4-GPTQ |
81.05 |
78.02 |
87.34 |
57.93 |
|
INT4-AWQ |
82.02 |
77.68 |
84.23 |
61.59 |
|
Qwen3-30B-A3B |
BF16 |
83.66 |
79.36 |
89.99 |
31.71 |
FP8-Static |
83.95 |
79.47 |
89.01 |
31.10 |
|
FP8-Dynamic |
84.10 |
79.40 |
89.16 |
32.93 |
|
INT8-Dynamic |
83.36 |
79.48 |
89.16 |
34.15 |
|
Qwen3-32B |
BF16 |
86.55 |
82.00 |
74.53 |
37.80 |
FP8-Static |
86.92 |
81.78 |
70.20 |
39.63 |
|
FP8-Dynamic |
86.55 |
81.89 |
70.43 |
38.41 |
|
INT4-GPTQ |
86.18 |
81.01 |
43.29 |
||
INT4-AWQ |
86.18 |
81.54 |
36.59 |
||
Qwen3-235B-A22B |
BF16 |
89.60 |
86.28 |
85.29 |
27.44 |
FP8-Static |
89.67 |
86.19 |
86.96 |
27.44 |
|
FP8-Dynamic |
89.67 |
86.18 |
85.22 |
28.05 |
|
INT8-Dynamic |
88.93 |
86.20 |
86.20 |
23.78 |
|
QwQ-32B |
BF16 |
85.74 |
82.03 |
73.31 |
42.68 |
FP8-Static |
85.44 |
81.91 |
75.36 |
42.68 |
|
FP8-Dynamic |
85.07 |
81.93 |
75.66 |
42.07 |
|
INT8-Dynamic |
86.40 |
81.97 |
74.37 |
45.73 |
|
INT4-GPTQ |
84.03 |
81.26 |
68.23 |
45.73 |
|
INT4-AWQ |
83.58 |
81.01 |
68.69 |
43.29 |
Qwen2.5VL#
Qwen2.5VL系列模型的BF16、FP8-Static、FP8-Dynamic、FP8-Static-ViT、FP8-Dynamic-ViT、INT4-GPTQ、INT4-AWQ在MMMU_VAL、DocVQA_VAL、ChartQA_TEST上的评测结果如下:
Model |
Quantization |
MMMU_VAL |
DocVQA_VAL |
ChartQA_TEST |
|---|---|---|---|---|
Qwen2.5VL-3B |
BF16 |
47.11 |
78.57 |
80.32 |
FP8-Static |
47.33 |
79.34 |
79.68 |
|
FP8-Dynamic |
47.00 |
78.92 |
79.60 |
|
FP8-Static-ViT |
45.56 |
79.36 |
80.16 |
|
INT8-Dynamic-ViT |
46.67 |
79.26 |
79.84 |
|
INT4-GPTQ |
46.56 |
77.20 |
78.96 |
|
INT4-AWQ |
45.78 |
79.60 |
||
Qwen2.5VL-7B |
BF16 |
45.44 |
89.71 |
84.64 |
FP8-Static |
47.00 |
89.83 |
85.92 |
|
FP8-Dynamic |
47.22 |
89.80 |
88.64 |
|
FP8-Static-ViT |
47.00 |
89.85 |
86.88 |
|
INT8-Dynamic-ViT |
46.44 |
89.68 |
88.72 |
|
INT4-GPTQ |
46.67 |
90.45 |
||
INT4-AWQ |
45.67 |
89.28 |
||
Qwen2.5VL-32B |
BF16 |
57.00 |
90.03 |
|
FP8-Static |
57.00 |
89.88 |
||
FP8-Dynamic |
56.44 |
89.88 |
||
FP8-Static-ViT |
56.33 |
89.92 |
||
INT8-Dynamic-ViT |
57.22 |
89.88 |
||
INT4-GPTQ |
55.22 |
89.80 |
||
INT4-AWQ |
55.22 |
90.30 |
||
Qwen2.5VL-72B |
BF16 |
58.78 |
94.39 |
85.60 |
FP8-Static |
57.89 |
94.41 |
85.84 |
|
FP8-Dynamic |
58.67 |
94.38 |
85.60 |
|
FP8-Static-ViT |
57.44 |
94.48 |
85.84 |
|
INT8-Dynamic-ViT |
58.22 |
94.47 |
86.00 |
|
INT4-GPTQ |
57.56 |
94.46 |
86.48 |
|
INT4-AWQ |
58.78 |
94.19 |
87.28 |
DeepSeek-R1-0528#
DeepSeek-R1-0528模型的FP8-Block-Wise、W4A8-FP8在GPQA Diamond、AIME 2024、SimpleQA、LiveCodeBench上的评测结果如下:
Model |
Quantization |
GPQA Diamond |
AIME 2024 |
SimpleQA |
LiveCodeBench |
|---|---|---|---|---|---|
DeepSeek-R1-0528 |
FP8-Block-Wise |
78.28 |
88.67 |
27.80 |
77.1 |
W4A8-FP8 |
77.37 |
88.67 |
26.83 |
78.86 |
Seed-OSS-36B-Instruct#
Seed-OSS-36B-Instruct模型的FP8-Static、FP8-Dynamic在CEVAL、MMLU、GSM8K、HUMANEVAL上的评测结果如下:
Model |
Quantization |
CEVAL |
MMLU |
GSM8K-strict |
GSM8K-flexible |
HUMANEVAL |
|---|---|---|---|---|---|---|
Seed-OSS-36B-Instruct |
BF16 |
88.19 |
82.97 |
70.36 |
97.12 |
87.20 |
FP8-Static |
87.82 |
82.79 |
74.75 |
96.51 |
86.59 |
|
FP8-Dynamic |
87.82 |
82.64 |
74.15 |
96.89 |
87.20 |
该数据使用lm-eval工具评测,注意需要设置--gen_kwargs max_gen_toks防止思考内容过长被截断。
GLM-4.6#
GLM-4.6模型的FP8-Static、FP8-Dynamic在CEVAL、GSM8K、HUMANEVAL上的评测结果如下:
Model |
Quantization |
CEVAL |
GSM8K |
HUMANEVAL |
|---|---|---|---|---|
GLM-4.6 |
BF16 |
82.6 |
93.71 |
73.78 |
FP8-Static |
83.14 |
93.86 |
66.46 |
|
FP8-Dynamic |
82.91 |
93.71 |
63.41 |
其他模型#
其他模型的BF16、FP8-Static、FP8-Dynamic、INT4-GPTQ、INT4-AWQ在CEVAL、MMLU、GSM8K上的评测结果如下:
Model |
Quantization |
CEVAL |
MMLU |
GSM8K |
|---|---|---|---|---|
Qwen2.5-1.5B-Instruct |
BF16 |
67.01 |
60.05 |
54.28 |
FP8-Static |
66.27 |
60.23 |
||
FP8-Dynamic |
66.79 |
60.08 |
51.71 |
|
Qwen2.5-7B-Instruct |
BF16 |
81.20 |
74.55 |
79.98 |
FP8-Static |
81.13 |
74.03 |
79.30 |
|
FP8-Dynamic |
80.31 |
74.07 |
79.00 |
|
INT4-GPTQ |
79.05 |
73.05 |
74.75 |
|
INT4-AWQ |
79.35 |
73.22 |
79.38 |
|
Qwen2.5-32B-Instruct |
BF16 |
87.30 |
83.21 |
81.73 |
FP8-Static |
87.59 |
83.08 |
81.58 |
|
FP8-Dynamic |
87.30 |
83.04 |
81.58 |
|
INT4-GPTQ |
86.70 |
82.45 |
82.03 |
|
INT4-AWQ |
87.00 |
82.64 |
||
DeepSeek-R1-Distill-Qwen-1.5B |
BF16 |
37.22 |
36.63 |
67.02 |
FP8-Static |
35.44 |
37.41 |
||
FP8-Dynamic |
35.96 |
36.12 |
64.75 |
|
DeepSeek-R1-Distill-Qwen-7B |
BF16 |
53.49 |
53.80 |
75.74 |
FP8-Static |
53.57 |
54.17 |
76.19 |
|
FP8-Dynamic |
52.97 |
54.13 |
74.15 |
|
INT4-GPTQ |
51.86 |
52.44 |
75.89 |
|
INT4-AWQ |
53.49 |
53.70 |
||
DeepSeek-R1-Distill-Qwen-14B |
BF16 |
77.71 |
74.28 |
85.67 |
FP8-Static |
77.56 |
74.66 |
86.73 |
|
FP8-Dynamic |
76.82 |
74.63 |
87.11 |
|
INT4-GPTQ |
74.29 |
72.37 |
84.61 |
|
INT4-AWQ |
74.81 |
73.00 |
86.05 |
|
DeepSeek-R1-Distill-Qwen-32B |
BF16 |
84.18 |
80.89 |
87.41 |
FP8-Static |
83.43 |
80.90 |
87.57 |
|
FP8-Dynamic |
83.73 |
81.10 |
86.43 |
|
INT4-GPTQ |
84.10 |
79.80 |
86.73 |
|
INT4-AWQ |
82.84 |
80.15 |
87.19 |
INT4-GPTAQ#
INT4-GPTAQ在GSM8K、HUMANEVAL、GPQA Diamond上的评测结果如下:
Model |
Quantization |
GSM8K |
HUMANEVAL |
GPQA Diamond |
|---|---|---|---|---|
Qwen3-4B |
BF16 |
85.37 |
72.56 |
37.88 |
INT4-GPTQ |
81.65 |
61.59 |
35.35 |
|
INT4-GPTAQ |
82.56 |
64.02 |
39.39 |
|
Qwen3-8B |
BF16 |
87.79 |
63.41 |
32.32 |
INT4-GPTQ |
86.43 |
62.20 |
34.85 |
|
INT4-GPTAQ |
86.66 |
64.02 |
33.33 |
|
Qwen3-32B |
BF16 |
74.53 |
37.80 |
40.40 |
INT4-GPTQ |
65.58 |
43.29 |
40.40 |
|
INT4-GPTAQ |
69.52 |
37.20 |
NVFP4#
NVFP4在GSM8K、MMLU、GPQA Diamond上的评测结果如下:
Model |
Quantization |
GSM8K |
MMLU |
GPQA Diamond |
|---|---|---|---|---|
Qwen3-32B |
BF16 |
67.06 |
81.72 |
54.04 |
NVFP4 |
69.87 |
80.74 |
56.06 |
|
Qwen3-235B-A22B |
BF16 |
96.63 |
62.73 |
60.60 |
NVFP4 |
96.17 |
62.09 |
60.10 |
Qwen3VL#
Qwen3VL系列模型的BF16、FP8-Static、FP8-Dynamic在MMMU_VAL、DocVQA_VAL、ChartQA_TEST上的评测结果如下:
Model |
Quantization |
MMMU_VAL |
DocVQA_VAL |
ChartQA_TEST |
|---|---|---|---|---|
Qwen3-VL-32B-Instruct |
BF16 |
60.11 |
96.08 |
94.64 |
FP8-Static |
61.22 |
96.00 |
94.64 |
|
FP8-Dynamic |
60.78 |
96.19 |
94.72 |
|
Qwen3-VL-30B-A3B-Instruct |
BF16 |
50.44 |
95.28 |
95.36 |
FP8-Dynamic |
50.67 |
95.25 |
95.20 |
FP8-Dynamic采用Block-wise的量化,启动命令:python3 tools/fp8_quant_blockwise.py –block_size –input_path –output_path
Qwen3-Omni#
Qwen3-Omni Text -> Text Benchmark
Qwen3-Omni模型的BF16、FP8-Static、FP8-Dynamic在aime25、gpqa_diamond、mmlu_redux上的评测结果如下:
Model |
Quantization |
aime25 |
gpqa_diamond |
mmlu_redux |
|---|---|---|---|---|
Qwen3-Omni-30B-A3B-Instruct |
BF16 |
73.32 |
56.77 |
88.09 |
FP8-Static |
71.33 |
56.57 |
87.91 |
|
FP8-Dynamic |
73.33 |
55.15 |
88.07 |