SUPERB

Excerpt

A comprehensive and reproducible benchmark for Self-supervised Speech Representation Learning


Speech processing Universal PERformance Benchmark

SUPERB is a collection of benchmarking resources to evaluate the capability of a universal shared representation for speech processing. SUPERB consists of the following:

  1. A benchmark of ten speech processing tasks[1] built on established public datasets,
  2. Adesigned to evaluate and analyze pretrained model performance on various downstream tasks following the conventional evaluation protocols from speech communities,
  3. A publicfor and performance tracking on the benchmark.

SUPERB aims to offer the community a standard and comprehensive framework to train, evaluate, and compare the generalizability of universal speech representations on speech processing tasks. A universal speech representation can be leveraged to quickly adapt to diverse downstream tasks with minimum architectural change and downstream fine-tuning, so as to reduce the model development cycle time for new tasks.To emphasize on evaluating the quality of the learned universal representation, SUPERB puts an explicit constraint on the downstream model and limits its parameter size.

The ultimate goal of SUPERB is to democratize the advancement in speech processing with powerful, generalizable, and reusable speech representations. SUPERB is a long-term maintained and continuously developing project. As we are gradually releasing new tasks and opening new tracks, we invite researchers to participate in the challenge and advance the research frontier together.

Acknowledgement


We thank and for creating and maintaining the SUPERB official website.