🪴 Anil's Garden

❯

Representing Speech Through Autoregressive Prediction of Cochlear Tokens

19 Dec 20251 min read

paper

Title: Representing Speech Through Autoregressive Prediction of Cochlear Tokens
Authors: Greta Tuckute, Klemen Kotar, Evelina Fedorenko, Daniel L. K. Yamins
Published: 15th August 2025 (Friday) @ 17:06:04
Link: http://arxiv.org/abs/2508.11598v1

Abstract

We introduce AuriStream, a biologically inspired model for encoding speech via a two-stage framework inspired by the human auditory processing hierarchy. The first stage transforms raw audio into a time-frequency representation based on the human cochlea, from which we extract discrete \textbf{cochlear tokens}. The second stage applies an autoregressive sequence model over the cochlear tokens. AuriStream learns meaningful phoneme and word representations, and state-of-the-art lexical semantics. AuriStream shows competitive performance on diverse downstream SUPERB speech tasks. Complementing AuriStream’s strong representational capabilities, it generates continuations of audio which can be visualized in a spectrogram space and decoded back into audio, providing insights into the model’s predictions. In summary, we present a two-stage framework for speech representation learning to advance the development of more human-like models that efficiently handle a range of speech-based tasks.

Graph View

Backlinks

No backlinks found

Website
Bluesky
Twitter/X
GitHub
LinkedIn
Instagram
Goodreads
Letterboxd
🍋

🪴 Anil's Garden

Explorer

Representing Speech Through Autoregressive Prediction of Cochlear Tokens

Graph View

Backlinks