Title: Grapheme-to-Phoneme Models for (Almost) Any Language
Authors: Aliya Deri, Kevin Knight
Published: 2016-08-01
Link: https://aclanthology.org/P16-1038/

Abstract

Grapheme-to-phoneme (g2p) models are rarely available in low-resource languages, as the creation of training and evaluation data is expensive and time-consuming. We use Wiktionary to obtain more than 650k word-pronunciation pairs in more than 500 languages. We then develop phoneme and language distance metrics based on phonological and linguistic knowledge; applying those, we adapt g2p models for highresource languages to create models for related low-resource languages. We provide results for models for 229 adapted languages.


Drive folder including data and models used and built

This folder includes data and models used in and built for Grapheme-to-Phoneme Models for (Almost) Any Language (Deri and Knight, ACL 2016). Refer to the paper for furtherinformation about how these resources were created. Please cite the paper if using the resources provided.

WFST models are run with Carmel (http://www.isi.edu/licensed-sw/carmel/). OpenFST versions of these models are also available upon request.