🪴 Anil's Garden

❯

Uncovering Latent Style Factors for Expressive Speech Synthesis

18 Jul 20251 min read

paper

Title: Uncovering Latent Style Factors for Expressive Speech Synthesis
Authors: Yuxuan Wang, RJ Skerry-Ryan, Ying Xiao, Daisy Stanton, Joel Shor, Eric Battenberg, Rob Clark, Rif A. Saurous
Published: 1st November 2017 (Wednesday) @ 19:40:00
Link: http://arxiv.org/abs/1711.00520v1

Abstract

Prosodic modeling is a core problem in speech synthesis. The key challenge is producing desirable prosody from textual input containing only phonetic information. In this preliminary study, we introduce the concept of “style tokens” in Tacotron, a recently proposed end-to-end neural speech synthesis model. Using style tokens, we aim to extract independent prosodic styles from training data. We show that without annotation data or an explicit supervision signal, our approach can automatically learn a variety of prosodic variations in a purely data-driven way. Importantly, each style token corresponds to a fixed style factor regardless of the given text sequence. As a result, we can control the prosodic style of synthetic speech in a somewhat predictable and globally consistent way.

Graph View

Backlinks

No backlinks found

Website
Bluesky
Twitter/X
GitHub
LinkedIn
Instagram
Goodreads
Letterboxd
🍋

🪴 Anil's Garden

Explorer

Uncovering Latent Style Factors for Expressive Speech Synthesis

Graph View

Backlinks