Title: Epitran: Precision G2P for Many Languages
Authors: David R. Mortensen, Siddharth Dalmia, Patrick Littell
Published: 2018-05-01
Link: https://aclanthology.org/L18-1429/

Abstract

Epitran is a massively multilingual G2P system. To maximize its usefulness, it is written in Python and distributed as open source software under an MIT license. Out of the box, it supports 61 languages. Additional languages can easily be added using either a simple, rule-based framework or by adding other back-ends. It has a number of advantages over other G2P and romanization packages like Unitran and URoman including sensible handling of different Latin scripts, precision transduction for each language-script pair (important when multiple languages use the same script differently), and proper use of international and de facto standards for phonetic representation (IPA, X-SAMPA).