🪴 Anil's Garden

❯

Direct speech-to-speech translation with a sequence-to-sequence model

19 Dec 20251 min read

paper

Title: Direct speech-to-speech translation with a sequence-to-sequence model
Authors: Ye Jia, Ron J. Weiss, Fadi Biadsy, Wolfgang Macherey, Melvin Johnson, Zhifeng Chen, Yonghui Wu
Published: 12th April 2019 (Friday) @ 05:15:31
Link: http://arxiv.org/abs/1904.06037v2

Abstract

We present an attention-based sequence-to-sequence neural network which can directly translate speech from one language into speech in another language, without relying on an intermediate text representation. The network is trained end-to-end, learning to map speech spectrograms into target spectrograms in another language, corresponding to the translated content (in a different canonical voice). We further demonstrate the ability to synthesize translated speech using the voice of the source speaker. We conduct experiments on two Spanish-to-English speech translation datasets, and find that the proposed model slightly underperforms a baseline cascade of a direct speech-to-text translation model and a text-to-speech synthesis model, demonstrating the feasibility of the approach on this very challenging task.

Graph View

Backlinks

No backlinks found

Website
Bluesky
Twitter/X
GitHub
LinkedIn
Instagram
Goodreads
Letterboxd
🍋

🪴 Anil's Garden

Explorer

Direct speech-to-speech translation with a sequence-to-sequence model

Graph View

Backlinks