Language Model Benchmarks

See also NLP-progress by Sebastian Ruder - Repository to track the progress in Natural Language Processing (NLP), including the datasets and the current state-of-the-art for the most common NLP tasks.

MMLU

MMLU-Pro

GPQA

HellaSwag Can a Machine Really Finish Your Sentence

  • Natural language inference

MATH-500

  • Reasoning

LiveCodeBench

  • Coding - what type? code gen? Code PE?

WikiSQL (Zhong et al., 2017)

  • NL to SQL queries

GLUE

SuperGLUE

SAMSum

  • Conversation summarisation