SuperGLUE Benchmark

Excerpt

SuperGLUE is a new benchmark styled after original GLUE benchmark with a set of more difficult language understanding tasks, improved resources, and a new public leaderboard.


RankNameModelURLScoreBoolQCBCOPAMultiRCReCoRDRTEWiCWSCAX-bAX-g
1JDExplore d-teamVega v291.390.598.6/99.299.488.2/62.494.4/93.996.077.498.6-0.4100.0/50.0
2Liam FedusST-MoE-32B91.292.496.9/98.099.289.6/65.895.1/94.493.577.796.672.396.1/94.1
3Microsoft Alexander v-teamTuring NLR v590.992.095.9/97.698.288.4/63.096.4/95.994.177.197.367.893.3/95.5
4ERNIE Team - BaiduERNIE 3.090.691.098.6/99.297.488.6/63.294.7/94.292.677.497.368.692.7/94.7
5Yi TayPaLM 540B90.491.994.4/96.099.088.7/63.694.2/93.394.177.495.972.995.5/90.4
6Zirui WangT5 + UDG, Single Model (Google Brain)90.491.495.8/97.698.088.3/63.094.2/93.593.077.996.669.192.7/91.9
7DeBERTa Team - MicrosoftDeBERTa / TuringNLRv490.390.495.7/97.698.488.2/63.794.5/94.193.277.595.966.793.3/93.8
8SuperGLUE Human BaselinesSuperGLUE Human Baselines89.889.095.8/98.9100.081.8/51.991.7/91.393.680.0100.076.699.3/99.7
9T5 Team - GoogleT589.391.293.9/96.894.888.1/63.394.1/93.492.576.993.865.692.7/91.9
10SPoT Team - GoogleFrozen T5 1.1 + SPoT89.291.195.8/97.695.687.9/61.993.3/92.492.975.893.866.983.1/82.6
11Huawei Noah’s Ark LabNEZHA-Plus86.787.894.4/96.093.684.6/55.190.1/89.689.174.693.258.087.1/74.4
12Alibaba PAI&ICBUPAI Albert86.188.192.4/96.491.884.6/54.789.0/88.388.874.193.275.698.3/99.2
13Infosys : DAWN : AI ResearchRoBERTa-iCETS86.088.593.2/95.291.286.4/58.289.9/89.389.972.989.061.888.8/81.5
14Tencent Jarvis LabRoBERTa (ensemble)85.988.292.5/95.690.884.4/53.491.5/91.087.974.191.857.689.3/75.6
15Zhuiyi TechnologyRoBERTa-mtl-adv85.787.192.4/95.691.285.1/54.391.7/91.388.172.191.858.591.0/78.1
16Facebook AIRoBERTa84.687.190.5/95.290.684.4/52.590.6/90.088.269.989.057.991.0/78.1
17Anuar SharafudinovAILabs Team, Transformers82.688.191.6/94.886.885.1/54.782.8/79.888.974.178.8100.0100.0/100.0
18Ying LuoFSL++(ALBERT)-Few-Shot(32 Examples)77.781.187.8/92.087.077.3/38.481.9/81.175.160.588.435.994.4/63.5
19Rathin BectorText to Text PETL77.082.086.9/92.480.280.4/44.882.2/81.378.167.674.038.197.2/53.7
20CASIAINSTALL(ALBERT)-few-shot76.678.485.9/92.085.675.9/35.184.3/83.574.960.984.9-0.4100.0/50.0
21Rakesh Radhakrishnan MenonADAPET (ALBERT) - few-shot76.080.082.3/92.085.476.2/35.786.1/85.575.053.585.6-0.4100.0/50.0
22Timo SchickiPET (ALBERT) - Few-Shot (32 Examples)75.481.279.9/88.890.874.1/31.785.9/85.470.849.388.436.297.8/57.9
23Adrian de WynterBort (Alexa AI)74.183.781.9/86.489.683.7/54.149.8/49.081.270.165.848.096.1/61.5
24IBM Research AIBERT-mtl73.584.889.6/94.073.873.2/30.574.6/74.084.166.261.029.697.8/57.3
25Ben MannGPT-3 few-shot - OpenAI71.876.452.0/75.692.075.4/30.591.1/90.269.049.480.121.190.4/55.3
26SuperGLUE BaselinesBERT++71.579.084.8/90.473.870.0/24.172.0/71.379.069.664.438.099.4/51.4
BERT69.077.475.7/83.670.670.0/24.172.0/71.371.769.664.423.097.8/51.7
Most Frequent Class47.162.321.7/48.450.061.1/0.333.4/32.550.350.065.10.0100.0/50.0
CBoW44.562.249.0/71.251.60.0/0.514.0/13.649.753.165.1-0.4100.0/50.0
Outside Best-80.4-84.470.4/24.574.8/73.082.7----
27Jeff Yangselect-step-by-step51.962.268.2/76.096.40.0/0.514.0/13.649.753.167.8-0.4100.0/50.0
28Karen HambardzumyanWARP (ALBERT-XXL-V2) - Few-Shot (32 Examples)48.762.270.2/82.451.60.0/0.514.0/13.669.153.163.7-0.4100.0/50.0
-Stanford Hazy ResearchSnorkel [SuperGLUE v1.9]--88.6/93.276.276.4/36.3-78.972.172.647.6-