🪴 Anil's Garden

SuperGLUE Benchmark

Excerpt

SuperGLUE is a new benchmark styled after original GLUE benchmark with a set of more difficult language understanding tasks, improved resources, and a new public leaderboard.

Rank	Name	Model	Score	BoolQ	CB	COPA	MultiRC	ReCoRD	RTE	WiC	WSC	AX-b	AX-g
1	JDExplore d-team	Vega v2	91.3	90.5	98.6/99.2	99.4	88.2/62.4	94.4/93.9	96.0	77.4	98.6	-0.4	100.0/50.0
2	Liam Fedus	ST-MoE-32B	91.2	92.4	96.9/98.0	99.2	89.6/65.8	95.1/94.4	93.5	77.7	96.6	72.3	96.1/94.1
3	Microsoft Alexander v-team	Turing NLR v5	90.9	92.0	95.9/97.6	98.2	88.4/63.0	96.4/95.9	94.1	77.1	97.3	67.8	93.3/95.5
4	ERNIE Team - Baidu	ERNIE 3.0	90.6	91.0	98.6/99.2	97.4	88.6/63.2	94.7/94.2	92.6	77.4	97.3	68.6	92.7/94.7
5	Yi Tay	PaLM 540B	90.4	91.9	94.4/96.0	99.0	88.7/63.6	94.2/93.3	94.1	77.4	95.9	72.9	95.5/90.4
6	Zirui Wang	T5 + UDG, Single Model (Google Brain)	90.4	91.4	95.8/97.6	98.0	88.3/63.0	94.2/93.5	93.0	77.9	96.6	69.1	92.7/91.9
7	DeBERTa Team - Microsoft	DeBERTa / TuringNLRv4	90.3	90.4	95.7/97.6	98.4	88.2/63.7	94.5/94.1	93.2	77.5	95.9	66.7	93.3/93.8
8	SuperGLUE Human Baselines	SuperGLUE Human Baselines	89.8	89.0	95.8/98.9	100.0	81.8/51.9	91.7/91.3	93.6	80.0	100.0	76.6	99.3/99.7
9	T5 Team - Google	T5	89.3	91.2	93.9/96.8	94.8	88.1/63.3	94.1/93.4	92.5	76.9	93.8	65.6	92.7/91.9
10	SPoT Team - Google	Frozen T5 1.1 + SPoT	89.2	91.1	95.8/97.6	95.6	87.9/61.9	93.3/92.4	92.9	75.8	93.8	66.9	83.1/82.6
11	Huawei Noah’s Ark Lab	NEZHA-Plus	86.7	87.8	94.4/96.0	93.6	84.6/55.1	90.1/89.6	89.1	74.6	93.2	58.0	87.1/74.4
12	Alibaba PAI&ICBU	PAI Albert	86.1	88.1	92.4/96.4	91.8	84.6/54.7	89.0/88.3	88.8	74.1	93.2	75.6	98.3/99.2
13	Infosys : DAWN : AI Research	RoBERTa-iCETS	86.0	88.5	93.2/95.2	91.2	86.4/58.2	89.9/89.3	89.9	72.9	89.0	61.8	88.8/81.5
14	Tencent Jarvis Lab	RoBERTa (ensemble)	85.9	88.2	92.5/95.6	90.8	84.4/53.4	91.5/91.0	87.9	74.1	91.8	57.6	89.3/75.6
15	Zhuiyi Technology	RoBERTa-mtl-adv	85.7	87.1	92.4/95.6	91.2	85.1/54.3	91.7/91.3	88.1	72.1	91.8	58.5	91.0/78.1
16	Facebook AI	RoBERTa	84.6	87.1	90.5/95.2	90.6	84.4/52.5	90.6/90.0	88.2	69.9	89.0	57.9	91.0/78.1
17	Anuar Sharafudinov	AILabs Team, Transformers	82.6	88.1	91.6/94.8	86.8	85.1/54.7	82.8/79.8	88.9	74.1	78.8	100.0	100.0/100.0
18	Ying Luo	FSL++(ALBERT)-Few-Shot(32 Examples)	77.7	81.1	87.8/92.0	87.0	77.3/38.4	81.9/81.1	75.1	60.5	88.4	35.9	94.4/63.5
19	Rathin Bector	Text to Text PETL	77.0	82.0	86.9/92.4	80.2	80.4/44.8	82.2/81.3	78.1	67.6	74.0	38.1	97.2/53.7
20	CASIA	INSTALL(ALBERT)-few-shot	76.6	78.4	85.9/92.0	85.6	75.9/35.1	84.3/83.5	74.9	60.9	84.9	-0.4	100.0/50.0
21	Rakesh Radhakrishnan Menon	ADAPET (ALBERT) - few-shot	76.0	80.0	82.3/92.0	85.4	76.2/35.7	86.1/85.5	75.0	53.5	85.6	-0.4	100.0/50.0
22	Timo Schick	iPET (ALBERT) - Few-Shot (32 Examples)	75.4	81.2	79.9/88.8	90.8	74.1/31.7	85.9/85.4	70.8	49.3	88.4	36.2	97.8/57.9
23	Adrian de Wynter	Bort (Alexa AI)	74.1	83.7	81.9/86.4	89.6	83.7/54.1	49.8/49.0	81.2	70.1	65.8	48.0	96.1/61.5
24	IBM Research AI	BERT-mtl	73.5	84.8	89.6/94.0	73.8	73.2/30.5	74.6/74.0	84.1	66.2	61.0	29.6	97.8/57.3
25	Ben Mann	GPT-3 few-shot - OpenAI	71.8	76.4	52.0/75.6	92.0	75.4/30.5	91.1/90.2	69.0	49.4	80.1	21.1	90.4/55.3
26	SuperGLUE Baselines	BERT++	71.5	79.0	84.8/90.4	73.8	70.0/24.1	72.0/71.3	79.0	69.6	64.4	38.0	99.4/51.4
		BERT	69.0	77.4	75.7/83.6	70.6	70.0/24.1	72.0/71.3	71.7	69.6	64.4	23.0	97.8/51.7
		Most Frequent Class	47.1	62.3	21.7/48.4	50.0	61.1/0.3	33.4/32.5	50.3	50.0	65.1	0.0	100.0/50.0
		CBoW	44.5	62.2	49.0/71.2	51.6	0.0/0.5	14.0/13.6	49.7	53.1	65.1	-0.4	100.0/50.0
		Outside Best	-	80.4	-	84.4	70.4/24.5	74.8/73.0	82.7	-	-	-	-
27	Jeff Yang	select-step-by-step	51.9	62.2	68.2/76.0	96.4	0.0/0.5	14.0/13.6	49.7	53.1	67.8	-0.4	100.0/50.0
28	Karen Hambardzumyan	WARP (ALBERT-XXL-V2) - Few-Shot (32 Examples)	48.7	62.2	70.2/82.4	51.6	0.0/0.5	14.0/13.6	69.1	53.1	63.7	-0.4	100.0/50.0
-	Stanford Hazy Research	Snorkel [SuperGLUE v1.9]	-	-	88.6/93.2	76.2	76.4/36.3	-	78.9	72.1	72.6	47.6	-

🪴 Anil's Garden

Explorer

SuperGLUE Benchmark

SuperGLUE Benchmark

Excerpt

Graph View

Table of Contents

Backlinks