Published September 23, 2018 | Version v1.0.0

Dataset Open

  • 1. Spotify
  • 2. NYU


The OpenMIC-2018 dataset is made available through a collaboration between Spotify and MARL@NYU. Additionally, the cost of annotation was sponsored by Spotify, whose contributions to open-source research can be found online at the developer site, engineering blog, and public GitHub.

If you use this dataset, please cite the following work:

Humphrey, Eric J., Durand, Simon, and McFee, Brian. “OpenMIC-2018: An Open Dataset for Multiple Instrument Recognition.” in Proceedings of the 19th International Society for Music Information Retrieval Conference (ISMIR), 2018. [pdf]

The dataset is made available by Spotify AB under a Creative Commons Attribution 4.0 International (CC BY 4.0) license. The full terms of this license are included alongside this dataset.

This dataset contains the following:

  • 10 second snippets of audio, in a directory format like ‘audio/{0:3}/{0}.ogg’.format(sample_key)
  • VGGish features as JSON objects, in a directory format like ‘vggish/{0:3}/{0}.json’.format(sample_key)
  • MD5 checksums for each OGG and JSON file
  • Anonymized individual responses, in ‘openmic-2018-individual-responses.csv’
  • Aggregated labels, in ‘openmic-2018-aggregated-labels.csv’
  • Track metadata, with licenses for each audio recording, in ‘openmic-2018-metadata.csv’
  • A Python-friendly NPZ file of features and labels, ‘openmic-2018.npz’
  • Sample partitions for train and test, in ‘partitions/*.txt’


Files (2.6 GB)

Additional details

  • Humphrey, Eric J., Durand, Simon, and McFee, Brian. “OpenMIC-2018: An Open Dataset for Multiple Instrument Recognition.” in Proceedings of the 19th International Society for Music Information Retrieval Conference (ISMIR), 2018.

CitationsCitations to this version