SUPERB
Excerpt
A comprehensive and reproducible benchmark for Self-supervised Speech Representation Learning
WavLM Large
Microsoft
M-P + VQ + GREP + Utterance Mixing
3.166e+8
4.326e+12
3.863e+11
6.764e+11
1.094e+12
2.169e+12
38
1145
97.86
99.31
3.06
3.44
70.62
8.86
92.21
18.36
95.49
3.77
3.24
WavLM Base+
Microsoft
M-P + VQ + GREP + Utterance Mixing
9.470e+7
1.670e+12
1.493e+11
2.614e+11
4.226e+11
8.367e+11
36.25
1106
97.37
99
3.92
5.59
68.65
9.88
90.58
21.2
89.42
4.07
3.5
IIITD
MIDAS_IIITD
JSC
-
9.618e+7
9.618e+7
9.618e+7
9.618e+7
9.618e+7
9.618e+7
32.65
1080
97.34
98.21
5.54
7.09
68.25
10.82
88.64
24.38
85.36
4.33
3.78
WavLM Base
Microsoft
M-P + VQ + GREP + Utterance Mixing
9.470e+7
1.670e+12
1.493e+11
2.614e+11
4.226e+11
8.367e+11
32.05
1019
96.79
98.63
4.84
6.21
65.94
8.7
89.38
22.86
84.51
4.69
4.55
LightHuBERT Stage 1
LightHuBERT
Once-for-All HuBERT + Two-Stage Distillation
9.500e+7
-
-
-
-
-
30.8
959
96.82
98.5
4.15
5.71
66.25
7.37
88.44
25.92
80.01
5.14
5.51
data2vec Large
Cl Tang
Masked Generative
3.143e+8
4.306e+12
3.841e+11
6.735e+11
1.089e+12
2.159e+12
30.2
949
96.75
98.31
3.6
3.36
66.31
6.28
90.98
22.16
76.77
5.73
5.53
data2vec-aqc Base
Speech Lab, IITM
Masked Generative (M-G) + M-C + VQ
9.384e+7
1.657e+12
1.480e+11
2.594e+11
4.192e+11
8.300e+11
27.95
935
96.36
98.92
4.11
5.39
67.59
6.65
89.39
22.88
59.87
5.82
4.84
HuBERT Base
paper
M-P + VQ
9.470e+7
1.669e+12
1.493e+11
2.613e+11
4.224e+11
8.363e+11
27.65
941
96.3
98.34
5.41
6.42
64.92
7.36
88.53
25.2
81.42
5.11
5.88
HuBERT Large
paper
M-P + VQ
3.166e+8
4.324e+12
3.861e+11
6.761e+11
1.094e+12
2.168e+12
27.55
919
95.29
98.76
3.53
3.62
67.62
3.53
89.81
21.76
90.33
5.98
5.75
CoBERT Base
ByteDance AI Lab
Code Representation Learning + Self-Distillation
9.435e+7
1.660e+12
1.480e+11
2.594e+11
4.192e+11
8.300e+11
26.7
894
96.36
98.87
3.08
4.74
65.32
5.07
89.04
23.35
72.66
6.13
5.74
ccc-wav2vec 2.0 Base
Speech Lab, IITM
M-C + VQ
9.504e+7
1.670e+12
1.493e+11
2.617e+11
4.228e+11
8.367e+11
26.4
940
96.72
96.47
5.95
6.3
64.17
6.73
88.08
24.34
72.84
5.61
4.27
wav2vec 2.0 Large
paper
M-C + VQ
3.174e+8
4.326e+12
3.861e+11
6.762e+11
1.094e+12
2.169e+12
26.15
914
96.66
95.28
4.75
3.75
65.64
4.89
87.11
27.31
86.14
5.65
5.62
data2vec base
Cl Tang
Masked Generative (M-G)
9.375e+7
1.657e+12
1.480e+11
2.594e+11
4.192e+11
8.300e+11
25.05
884
96.56
97.63
4.69
4.94
66.27
5.76
88.59
25.27
70.21
5.77
6.67
STaRHuBERT-L
Kangwook Jang
Temporal Gram Matrix Distillation
-
2.663e+7
5.119e+11
4.406e+10
7.793e+10
1.278e+11
2.621e+11
24.85
901
96.56
97.5
7.39
8.9
63.48
7
88.01
25.36
78.66
5.45
5.83
DPWavLM
Yifan Peng
DPWavLM is a task-agnostic compression method based on joint distillation and structured pruning.
2.359e+7
5.892e+11
5.356e+10
9.334e+10
1.499e+11
2.924e+11
24.5
926
96.27
98.58
8.22
10.19
65.24
8.74
87.68
26.11
82.11
5.98
5.53
LightHuBERT Small
LightHuBERT
Once-for-All HuBERT + Two-Stage Distillation
2.700e+7
8.607e+11
7.721e+10
1.351e+11
2.180e+11
4.304e+11
23.8
901
96.07
98.23
6.6
8.34
64.12
7.64
87.58
26.9
69.7
5.42
5.85
ARMwavLM-S
Kangwook Jang
Attention map reusing + Masking distillation
2.239e+7
4.499e+11
3.924e+10
6.915e+10
1.129e+11
2.287e+11
22.65
861
96.98
97.76
7.43
9.95
64.08
7.41
87.46
26.09
71.18
5.9
6.78
STaRHuBERT
Kangwook Jang
Temporal Gram Matrix Distillation
-
2.231e+7
4.635e+11
3.953e+10
7.009e+10
1.154e+11
2.385e+11
21.95
880
96.27
97.55
7.97
9.35
63.01
6.88
87.94
25.31
77.58
5.71
6.05
FaST-VGS+
Puyuan Peng, David Harwath
FaST-VGS loss + w2v2 loss
-
2.172e+8
-
-
-
-
-
21.5
809
97.27
98.97
7.76
8.83
62.71
5.62
88.15
27.12
41.34
5.87
6.05
DPHuBERT
Yifan Peng
DPHuBERT is a task-agnostic compression method based on joint distillation and structured pruning.
2.359e+7
6.541e+11
5.960e+10
1.038e+11
1.666e+11
3.241e+11
20.9
866
96.36
97.92
9.67
10.47
63.16
6.93
86.86
28.26
76.83
5.84
5.92
ARMHuBERT
Kangwook Jang
Attention map reusing + Masking distillation
2.645e+7
5.016e+11
4.375e+10
7.710e+10
1.258e+11
2.549e+11
20.55
832
97.05
97.23
7.73
10.08
62.77
6.35
87.21
26.88
65.19
5.65
6.78
wav2vec 2.0 Base
paper
M-C + VQ
9.504e+7
1.669e+12
1.493e+11
2.613e+11
4.224e+11
8.363e+11
19.5
818
96.23
92.35
5.74
6.43
63.43
2.33
88.3
24.77
75.18
6.02
6.08
STaRHuBERT-S
Kangwook Jang
Temporal Gram Matrix Distillation
-
1.411e+7
3.563e+11
3.036e+10
5.384e+10
8.865e+10
1.835e+11
17.75
847
95.98
96.18
10.08
10.29
62.03
6.67
87.03
27.79
70.09
5.82
5.88
STaRHuBERT-XS
Kangwook Jang
Temporal Gram Matrix Distillation
-
9.393e+6
2.959e+11
2.513e+10
4.461e+10
7.354e+10
1.526e+11
15.3
804
95.33
94.12
11.83
11.37
61.24
6.78
85.9
29.42
64.77
5.95
6.49
DistilHuBERT
Heng-Jui Chang
multi-task layer-wise distillation
-
2.349e+7
7.859e+11
7.251e+10
1.259e+11
2.010e+11
3.865e+11
15
717
95.98
94.99
16.27
13.37
63.02
5.11
82.57
35.59
73.54
8.55
6.19
DeCoAR 2.0
paper
M-G + VQ
8.984e+7
1.114e+12
9.719e+10
1.713e+11
2.796e+11
5.661e+11
13.55
722
94.48
90.8
14.93
13.02
62.47
4.06
83.28
34.73
74.42
7.16
6.59
wav2vec
paper
F-C
3.254e+7
1.086e+12
1.016e+11
1.760e+11
2.795e+11
5.291e+11
10.5
529
95.59
84.92
31.58
15.86
59.79
4.85
76.37
43.71
56.56
7.99
9.9
admin_baseline
Leo Yang
Used to make sure the server is working correctly
-
0.000e+0
0.000e+0
0.000e+0
0.000e+0
0.000e+0
0.000e+0
9.1
370
95.94
74.69
41.98
24.28
66.67
1.77
70.46
51.57
60.42
10.03
10.53
vq-wav2vec
paper
F-C + VQ
3.415e+7
1.118e+12
1.046e+11
1.813e+11
2.878e+11
5.449e+11
8.4
422
93.38
85.68
33.48
17.71
58.24
4.1
77.68
41.54
38.8
10.38
9.93
WavLM Base+
Lawrance
WavLM Base+
-
9.470e+7
1.600e+2
1.600e+2
1.600e+2
1.600e+2
1.600e+2
7.4
-
96.92
-
4.64
-
-
-
-
-
-
-
-
VQ-APC
paper
F-G + VQ
4.630e+6
5.135e+11
1.814e+10
3.327e+10
5.256e+10
9.447e+10
7.05
377
91.11
74.48
41.08
21.2
59.66
2.51
68.53
52.91
60.15
8.72
10.45
APC
paper
F-G
4.105e+6
5.017e+11
1.704e+10
3.137e+10
4.953e+10
8.874e+10
6.95
392
91.01
74.69
41.98
21.28
59.33
3.1
70.46
50.89
60.42
8.56
10.53
NPC
paper
M-G + VQ
1.938e+7
4.349e+11
4.063e+10
7.043e+10
1.119e+11
2.119e+11
6.6
386
88.96
69.44
43.81
20.2
59.08
2.46
72.79
48.44
55.92
9.4
9.34
modified CPC
paper
F-C
1.843e+6
2.026e+11
1.510e+10
2.635e+10
4.175e+10
7.832e+10
6.4
278
91.88
64.09
42.54
20.18
60.96
3.26
71.19
49.91
39.63
12.86
10.38
Hubert Base
Lawrance
PR and KS
-
9.440e+7
1.600e+2
1.600e+2
1.600e+2
1.600e+2
1.600e+2
5.8
-
96.53
-
5.99
-
-
-
-
-
-
-
-
Wav2Vec 2.0
Lawrance
Wav2Vec2-Base-960h
-
9.500e+7
1.600e+2
1.600e+2
1.600e+2
1.600e+2
1.600e+2
5.2
-
96.33
-
6.49
-
-
-
-
-
-
-
-
TERA
paper
time/freq M-G
2.133e+7
5.677e+11
4.789e+10
8.579e+10
1.432e+11
2.908e+11
4.2
150
89.48
58.42
49.17
18.17
56.27
0.13
67.5
54.17
57.57
15.89
9.96
layer10
gaeulisautumn
1
-
1.000e+0
1.000e+0
1.000e+0
1.000e+0
1.000e+0
1.000e+0
3.2
-
-
-
-
-
-
7.29
-
-
-
-
-
HuBERT
gaeulisautumn
dtw to 1.3.0
-
2.000e+0
2.000e+0
2.000e+0
2.000e+0
2.000e+0
2.000e+0
3.1
-
-
-
-
-
-
7.19
-
-
-
-
-
PASE+
paper
multi-task
7.833e+6
4.954e+11
4.648e+10
8.036e+10
1.275e+11
2.411e+11
3.05
149
82.54
29.82
58.87
25.11
57.86
0.72
62.14
60.17
37.99
11.61
8.68
b0990106x
éłäșç
wav2vec2-ctc PR
-
1.600e+2
1.600e+2
1.600e+2
1.600e+2
1.600e+2
1.600e+2
2.8
-
-
-
6.28
-
-
-
-
-
-
-
-
FBANK
paper
classic feature
0.000e+0
4.791e+8
4.477e+7
7.760e+7
1.233e+8
2.334e+8
2.15
0
41.38
9.65
82.01
23.18
48.24
0.58
69.64
52.94
20.06
9.56
10.05
distilHubert_base KS
éłäșç
distilHubert_base KS
-
0.000e+0
1.600e+2
1.600e+2
1.600e+2
1.600e+2
1.600e+2
2
-
96.2
-
-
-
-
-
-
-
-
-
-
wav2vec2 SF
éłäșç
wav2vec2 SF
-
0.000e+0
1.600e+2
1.600e+2
1.600e+2
1.600e+2
1.600e+2
1.95
-
-
-
-
-
-
-
86.89
26.44
-
-
-
Mockingjay
paper
time M-G
8.512e+7
2.076e+12
1.909e+11
3.368e+11
5.317e+11
1.017e+12
1.75
54
83.67
34.33
70.19
22.82
50.28
0.07
61.59
58.89
32.29
11.66
10.54
wav2vec2_large
gaeulisautumn
dtw to 1.3.0
-
1.000e+0
1.000e+0
1.000e+0
1.000e+0
1.000e+0
1.000e+0
1.7
-
-
-
-
-
-
5.03
-
-
-
-
-
distilhubert_base
éłäșç
distilhubert_base ctc PR
-
1.600e+2
1.600e+2
1.600e+2
1.600e+2
1.600e+2
1.600e+2
1.6
-
-
-
14.11
-
-
-
-
-
-
-
-
wav2vec2 KS
éłäșç
wav2vec2 KS
-
0.000e+0
1.600e+2
1.600e+2
1.600e+2
1.600e+2
1.600e+2
1.5
-
95.85
-
-
-
-
-
-
-
-
-
-
distilHubert_base SF
éłäșç
distilHubert_base SF
-
0.000e+0
1.600e+2
1.600e+2
1.600e+2
1.600e+2
1.600e+2
1.35
-
-
-
-
-
-
-
82.94
34.36
-
-
-
HuBERT_Large
gaeulisautumn
dtw to 1.3.0
-
1.000e+0
1.000e+0
1.000e+0
1.000e+0
1.000e+0
1.000e+0
1.1
-
-
-
-
-
-
3.4
-
-
-
-
-
wav2vec2
gaeulisautumn
dtw to 1.3.0
-
1.000e+0
1.000e+0
1.000e+0
1.000e+0
1.000e+0
1.000e+0
0.5
-
-
-
-
-
-
2.13
-
-
-
-
-
SF
èŹæżäżź
sf
-
1.600e+2
1.600e+2
1.600e+2
1.600e+2
1.600e+2
1.600e+2
0.4
-
8.15
-
81.96
-
-
-
68.15
53.53
-
-
-
Alan
èŹæżäżź
PR
-
1.600e+2
1.600e+2
1.600e+2
1.600e+2
1.600e+2
1.600e+2
0.1
-
-
-
81.96
-
-
-
-
-
-
-
-
KS
èŹæżäżź
ks
-
1.600e+2
1.600e+2
1.600e+2
1.600e+2
1.600e+2
1.600e+2
0.1
-
8.15
-
81.96
-
-
-
-
-
-
-
-