🪴 Anil's Garden

❯

SpeechX: Neural Codec Language Model as a Versatile Speech Transformer

19 Dec 20251 min read

paper
speech

Title: SpeechX: Neural Codec Language Model as a Versatile Speech Transformer
Authors: Xiaofei Wang, Manthan Thakker, Zhuo Chen, Naoyuki Kanda, Sefik Emre Eskimez, Sanyuan Chen, Min Tang, Shujie Liu, Jinyu Li, Takuya Yoshioka
Published: 14th August 2023 (Monday) @ 01:01:19
Link: http://arxiv.org/abs/2308.06873v1

Abstract

Recent advancements in generative speech models based on audio-text prompts have enabled remarkable innovations like high-quality zero-shot text-to-speech. However, existing models still face limitations in handling diverse audio-text speech generation tasks involving transforming input speech and processing audio captured in adverse acoustic conditions. This paper introduces SpeechX, a versatile speech generation model capable of zero-shot TTS and various speech transformation tasks, dealing with both clean and noisy signals. SpeechX combines neural codec language modeling with multi-task learning using task-dependent prompting, enabling unified and extensible modeling and providing a consistent way for leveraging textual input in speech enhancement and transformation tasks. Experimental results show SpeechX’s efficacy in various tasks, including zero-shot TTS, noise suppression, target speaker extraction, speech removal, and speech editing with or without background noise, achieving comparable or superior performance to specialized models across tasks. See https://aka.ms/speechx for demo samples.

Graph View

Backlinks

Speech and Audio - Rolodex - Papers, Models and Releases

Website
Bluesky
Twitter/X
GitHub
LinkedIn
Instagram
Goodreads
Letterboxd
🍋

🪴 Anil's Garden

Explorer

SpeechX: Neural Codec Language Model as a Versatile Speech Transformer

Graph View

Backlinks