This page lists some speech-related research conducted by the team led by Xu Tan. The research topics cover text to speech, singing voice synthesis, music generation, automatic speech recognition, etc. Some research are open-sourced via NeuralSpeech and Muzic.
We are hiring researchers on audio/video generation and LLMs. Please contact tanxu2012@gmail.com if you are interested.
NaturalSpeech 3: Zero-Shot Speech Synthesis with Factorized Codec and Diffusion Models
February 8, 2024
PromptTTS 2: Describing and Generating Voices with Text Prompt
September 07, 2023
NaturalSpeech 2: Latent Diffusion Models are Natural and Zero-Shot Speech and Singing Synthesizers
April 19, 2023
PromptTTS: Controllable Text-to-Speech with Text Descriptions
November 22, 2022
VideoDubber: Machine Translation with Speech-Aware Length Control for Video Dubbing
August 30, 2022
BinauralGrad: A Two-Stage Conditional Diffusion Probabilistic Model for Binaural Audio Synthesis
May 29, 2022
NaturalSpeech: End-to-End Text to Speech Synthesis with Human-Level Quality
May 03, 2022
Mixed-Phoneme BERT: Improving BERT with Mixed Phoneme and Sup-Phoneme Representations for Text to Speech
April 02, 2022
AdaSpeech 4: Adaptive Text to Speech in Zero-Shot Scenarios
March 06, 2022
Speech-T: Transducer for Text to Speech and Beyond
October 06, 2021
TeleMelody: Lyric-to-Melody Generation with a Template-Based Two-Stage Method
September 21, 2021
DeepRapper: Neural Rap Generation with Rhyme and Rhythm Modeling
August 16, 2021
PriorGrad: Improving Conditional Denoising Diffusion Models with Data-Driven Adaptive Prior
June 11, 2021
AdaSpeech 3: Adaptive Text to Speech for Spontaneous Style
June 02, 2021
AdaSpeech 2: Adaptive Text to Speech with Untranscribed Data
March 05, 2021
AdaSpeech: Adaptive Text to Speech for Custom Voice
March 01, 2021
FastSpeech 2: Fast and High-Quality End-to-End Text to Speech
February 10, 2021
SongMASS: Automatic Song Writing with Pre-training and Alignment Constraint
December 14, 2020
LightSpeech: Lightweight and Fast Text to Speech with Neural Architecture Search
November 03, 2020
DenoiSpeech: Denoising Text to Speech with Frame-Level Noise Modeling
October 14, 2020
HiFiSinger: Towards High-Fidelity Neural Singing Voice Synthesis
September 02, 2020
PopMAG: Pop Music Accompaniment Generation
August 01, 2020
UWSpeech: Speech to Speech Translation for Unwritten Languages
June 12, 2020
MultiSpeech: Multi-Speaker Text to Speech with Transformer
May 09, 2020
Semi-Supervised Neural Architecture Search
March 01, 2020
DeepSinger: Singing Voice Synthesis with Data Mined From the Web
February 14, 2020
LRSpeech: Extremely Low-Resource Speech Synthesis and Recognition
February 02, 2020
FastSpeech: Fast, Robust and Controllable Text to Speech
May 10, 2019
Almost Unsupervised Text to Speech and Automatic Speech Recognition
April 10, 2019