量化benchmark#

Hunyuan-Instruct#

Hunyuan-Instruct的BF16FP8INT4-GPTQINT4-AWQOlympiadBenchAIME 2024DROPGPQA-Diamond上的评测结果如下:

Model

Quantization

CEVAL

MMLU

GSM8K

HUMANEVAL

Hunyuan-A13B-Instruct

BF16

82.70

87.30

91.10

71.20

FP8-Static

83.00

86.70

91.10

INT4-GPTQ

82.70

86.70

91.10

INT4-AWQ

82.60

85.60

91.00

Hunyuan-7B-Instruct

BF16

76.50

81.10

85.90

60.10

FP8-Static

76.60

80.90

86.00

60.10

INT4-GPTQ

76.20

81.00

85.70

60.00

INT4-AWQ

76.40

80.90

85.90

60.10

Hunyuan-4B-Instruct

BF16

73.10

78.30

78.20

61.10

FP8-Static

73.10

76.60

78.30

60.20

INT4-GPTQ

72.90

78.10

58.10

INT4-AWQ

72.80

78.20

Hunyuan-1.8B-Instruct

BF16

63.40

56.70

76.70

47.20

FP8-Static

62.50

55.20

75.10

47.70

INT4-GPTQ

60.90

73.00

44.40

INT4-AWQ

61.70

71.70

43.60

Hunyuan-0.5B-Instruct

BF16

29.60

17.20

52.80

23.30

FP8-Static

29.60

17.20

51.60

22.50

INT4-GPTQ

26.80

50.90

23.30

INT4-AWQ

26.30

48.90

23.30

Qwen3#

Qwen3系列模型的BF16FP8-StaticFP8-DynamicINT4-GPTQINT4-AWQCEVALMMLUGSM8KHUMANEVAL上的评测结果如下:

Model

Quantization

CEVAL

MMLU

GSM8K

HUMANEVAL

Qwen3-0.6B

BF16

45.84

47.21

42.99

19.51

FP8-Static

45.99

46.87

38.06

18.90

FP8-Dynamic

45.99

46.93

38.29

20.73

INT8-Dynamic

45.17

46.95

41.17

21.34

Qwen3-1.7B

BF16

60.33

59.77

68.69

40.85

FP8-Static

61.07

59.39

68.01

38.41

FP8-Dynamic

60.77

59.88

67.10

34.76

INT8-Dynamic

60.25

59.80

68.54

41.46

INT4-GPTQ

57.50

56.93

INT4-AWQ

59.06

56.86

Qwen3-4B

BF16

72.66

69.99

85.37

72.56

FP8-Static

72.14

69.93

83.70

73.17

FP8-Dynamic

70.80

70.08

83.40

69.51

INT8-Dynamic

72.21

69.47

85.75

66.46

INT4-GPTQ

70.06

68.59

81.65

INT4-AWQ

70.36

67.62

80.59

Qwen3-8B

BF16

79.27

74.78

87.79

63.41

FP8-Static

78.23

74.79

86.96

62.20

FP8-Dynamic

78.45

74.75

87.64

62.80

INT8-Dynamic

78.01

74.84

86.96

67.07

INT4-GPTQ

77.19

73.26

86.43

62.20

INT4-AWQ

76.15

73.59

86.96

63.41

Qwen3-14B

BF16

83.06

78.90

88.40

55.49

FP8-Static

82.62

78.57

89.46

57.32

FP8-Dynamic

82.24

78.92

88.32

52.44

INT8-Dynamic

81.87

78.13

86.28

56.10

INT4-GPTQ

81.05

78.02

87.34

57.93

INT4-AWQ

82.02

77.68

84.23

61.59

Qwen3-30B-A3B

BF16

83.66

79.36

89.99

31.71

FP8-Static

83.95

79.47

89.01

31.10

FP8-Dynamic

84.10

79.40

89.16

32.93

INT8-Dynamic

83.36

79.48

89.16

34.15

Qwen3-32B

BF16

86.55

82.00

74.53

37.80

FP8-Static

86.92

81.78

70.20

39.63

FP8-Dynamic

86.55

81.89

70.43

38.41

INT4-GPTQ

86.18

81.01

43.29

INT4-AWQ

86.18

81.54

36.59

Qwen3-235B-A22B

BF16

89.60

86.28

85.29

27.44

FP8-Static

89.67

86.19

86.96

27.44

FP8-Dynamic

89.67

86.18

85.22

28.05

INT8-Dynamic

88.93

86.20

86.20

23.78

QwQ-32B

BF16

85.74

82.03

73.31

42.68

FP8-Static

85.44

81.91

75.36

42.68

FP8-Dynamic

85.07

81.93

75.66

42.07

INT8-Dynamic

86.40

81.97

74.37

45.73

INT4-GPTQ

84.03

81.26

68.23

45.73

INT4-AWQ

83.58

81.01

68.69

43.29

Qwen2.5VL#

Qwen2.5VL系列模型的BF16FP8-StaticFP8-DynamicFP8-Static-ViTFP8-Dynamic-ViTINT4-GPTQINT4-AWQMMMU_VALDocVQA_VALChartQA_TEST上的评测结果如下:

Model

Quantization

MMMU_VAL

DocVQA_VAL

ChartQA_TEST

Qwen2.5VL-3B

BF16

47.11

78.57

80.32

FP8-Static

47.33

79.34

79.68

FP8-Dynamic

47.00

78.92

79.60

FP8-Static-ViT

45.56

79.36

80.16

INT8-Dynamic-ViT

46.67

79.26

79.84

INT4-GPTQ

46.56

77.20

78.96

INT4-AWQ

45.78

79.60

Qwen2.5VL-7B

BF16

45.44

89.71

84.64

FP8-Static

47.00

89.83

85.92

FP8-Dynamic

47.22

89.80

88.64

FP8-Static-ViT

47.00

89.85

86.88

INT8-Dynamic-ViT

46.44

89.68

88.72

INT4-GPTQ

46.67

90.45

INT4-AWQ

45.67

89.28

Qwen2.5VL-32B

BF16

57.00

90.03

FP8-Static

57.00

89.88

FP8-Dynamic

56.44

89.88

FP8-Static-ViT

56.33

89.92

INT8-Dynamic-ViT

57.22

89.88

INT4-GPTQ

55.22

89.80

INT4-AWQ

55.22

90.30

Qwen2.5VL-72B

BF16

58.78

94.39

85.60

FP8-Static

57.89

94.41

85.84

FP8-Dynamic

58.67

94.38

85.60

FP8-Static-ViT

57.44

94.48

85.84

INT8-Dynamic-ViT

58.22

94.47

86.00

INT4-GPTQ

57.56

94.46

86.48

INT4-AWQ

58.78

94.19

87.28

DeepSeek-R1-0528#

DeepSeek-R1-0528模型的FP8-Block-WiseW4A8-FP8GPQA DiamondAIME 2024SimpleQALiveCodeBench上的评测结果如下:

Model

Quantization

GPQA Diamond

AIME 2024

SimpleQA

LiveCodeBench

DeepSeek-R1-0528

FP8-Block-Wise

78.28

88.67

27.80

77.1

W4A8-FP8

77.37

88.67

26.83

78.86

Seed-OSS-36B-Instruct#

Seed-OSS-36B-Instruct模型的FP8-StaticFP8-DynamicCEVALMMLUGSM8KHUMANEVAL上的评测结果如下:

Model

Quantization

CEVAL

MMLU

GSM8K-strict

GSM8K-flexible

HUMANEVAL

Seed-OSS-36B-Instruct

BF16

88.19

82.97

70.36

97.12

87.20

FP8-Static

87.82

82.79

74.75

96.51

86.59

FP8-Dynamic

87.82

82.64

74.15

96.89

87.20

该数据使用lm-eval工具评测,注意需要设置--gen_kwargs max_gen_toks防止思考内容过长被截断。

GLM-4.6#

GLM-4.6模型的FP8-StaticFP8-DynamicCEVALGSM8KHUMANEVAL上的评测结果如下:

Model

Quantization

CEVAL

GSM8K

HUMANEVAL

GLM-4.6

BF16

82.6

93.71

73.78

FP8-Static

83.14

93.86

66.46

FP8-Dynamic

82.91

93.71

63.41

其他模型#

其他模型的BF16FP8-StaticFP8-DynamicINT4-GPTQINT4-AWQCEVALMMLUGSM8K上的评测结果如下:

Model

Quantization

CEVAL

MMLU

GSM8K

Qwen2.5-1.5B-Instruct

BF16

67.01

60.05

54.28

FP8-Static

66.27

60.23

FP8-Dynamic

66.79

60.08

51.71

Qwen2.5-7B-Instruct

BF16

81.20

74.55

79.98

FP8-Static

81.13

74.03

79.30

FP8-Dynamic

80.31

74.07

79.00

INT4-GPTQ

79.05

73.05

74.75

INT4-AWQ

79.35

73.22

79.38

Qwen2.5-32B-Instruct

BF16

87.30

83.21

81.73

FP8-Static

87.59

83.08

81.58

FP8-Dynamic

87.30

83.04

81.58

INT4-GPTQ

86.70

82.45

82.03

INT4-AWQ

87.00

82.64

DeepSeek-R1-Distill-Qwen-1.5B

BF16

37.22

36.63

67.02

FP8-Static

35.44

37.41

FP8-Dynamic

35.96

36.12

64.75

DeepSeek-R1-Distill-Qwen-7B

BF16

53.49

53.80

75.74

FP8-Static

53.57

54.17

76.19

FP8-Dynamic

52.97

54.13

74.15

INT4-GPTQ

51.86

52.44

75.89

INT4-AWQ

53.49

53.70

DeepSeek-R1-Distill-Qwen-14B

BF16

77.71

74.28

85.67

FP8-Static

77.56

74.66

86.73

FP8-Dynamic

76.82

74.63

87.11

INT4-GPTQ

74.29

72.37

84.61

INT4-AWQ

74.81

73.00

86.05

DeepSeek-R1-Distill-Qwen-32B

BF16

84.18

80.89

87.41

FP8-Static

83.43

80.90

87.57

FP8-Dynamic

83.73

81.10

86.43

INT4-GPTQ

84.10

79.80

86.73

INT4-AWQ

82.84

80.15

87.19

INT4-GPTAQ#

INT4-GPTAQ在GSM8KHUMANEVALGPQA Diamond上的评测结果如下:

Model

Quantization

GSM8K

HUMANEVAL

GPQA Diamond

Qwen3-4B

BF16

85.37

72.56

37.88

INT4-GPTQ

81.65

61.59

35.35

INT4-GPTAQ

82.56

64.02

39.39

Qwen3-8B

BF16

87.79

63.41

32.32

INT4-GPTQ

86.43

62.20

34.85

INT4-GPTAQ

86.66

64.02

33.33

Qwen3-32B

BF16

74.53

37.80

40.40

INT4-GPTQ

65.58

43.29

40.40

INT4-GPTAQ

69.52

37.20

NVFP4#

NVFP4在GSM8KMMLUGPQA Diamond上的评测结果如下:

Model

Quantization

GSM8K

MMLU

GPQA Diamond

Qwen3-32B

BF16

67.06

81.72

54.04

NVFP4

69.87

80.74

56.06

Qwen3-235B-A22B

BF16

96.63

62.73

60.60

NVFP4

96.17

62.09

60.10

Qwen3VL#

Qwen3VL系列模型的BF16FP8-StaticFP8-DynamicMMMU_VALDocVQA_VALChartQA_TEST上的评测结果如下:

Model

Quantization

MMMU_VAL

DocVQA_VAL

ChartQA_TEST

Qwen3-VL-32B-Instruct

BF16

60.11

96.08

94.64

FP8-Static

61.22

96.00

94.64

FP8-Dynamic

60.78

96.19

94.72

Qwen3-VL-30B-A3B-Instruct

BF16

50.44

95.28

95.36

FP8-Dynamic

50.67

95.25

95.20

FP8-Dynamic采用Block-wise的量化,启动命令:python3 tools/fp8_quant_blockwise.py –block_size –input_path –output_path

Qwen3-Omni#

Qwen3-Omni Text -> Text Benchmark

Qwen3-Omni模型的BF16FP8-StaticFP8-Dynamicaime25gpqa_diamondmmlu_redux上的评测结果如下:

Model

Quantization

aime25

gpqa_diamond

mmlu_redux

Qwen3-Omni-30B-A3B-Instruct

BF16

73.32

56.77

88.09

FP8-Static

71.33

56.57

87.91

FP8-Dynamic

73.33

55.15

88.07