Comparison of Grapheme-to-Phoneme Conversion Methods on a Myanmar Pronunciation Dictionary

Grapheme-to-Phoneme (G2P) conversion is the task of predicting the pronunciation of a word given its graphemic or written form. It is a highly important part of both automatic speech recognition (ASR) and text-to-speech (TTS) systems. In this paper, we evaluate seven G2P conversion approaches: Adaptive Regularization of Weight Vectors (AROW) based structured learning (S-AROW), Conditional Random Field (CRF), Joint-sequence models (JSM), phrase-based statistical machine translation (PBSMT), Recurrent Neural Network (RNN), Support Vector Machine (SVM) based point-wise classification, Weighted Finite-state Transducers (WFST) on a manually tagged Myanmar phoneme dictionary. The G2P bootstrapping experimental results were measured with both automatic phoneme error rate (PER) calculation and also manual checking in terms of voiced/unvoiced, tones, consonant and vowel errors. The result shows that CRF, PBSMT and WFST approaches are the best performing methods for G2P conversion on Myanmar language.

[1]  Christoph Tillmann,et al.  A Unigram Orientation Model for Statistical Machine Translation , 2004, NAACL.

[2]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[3]  Yoshua Bengio,et al.  Neural Machine Translation by Jointly Learning to Align and Translate , 2014, ICLR.

[4]  F ChenStanley,et al.  An Empirical Study of Smoothing Techniques for Language Modeling , 1996, ACL.

[5]  Paul Deléglise,et al.  Grapheme to phoneme conversion using an SMT system , 2009, INTERSPEECH.

[6]  Franz Josef Och,et al.  Minimum Error Rate Training in Statistical Machine Translation , 2003, ACL.

[7]  Tomoki Toda,et al.  Structured Adaptive Regularization of Weight Vectors for a Robust Grapheme-to-Phoneme Conversion Model , 2014, IEICE Trans. Inf. Syst..

[8]  Eiichiro Sumita,et al.  The Application of Phrase Based Statistical Machine Translation Techniques to Myanmar Grapheme to Phoneme Conversion , 2015, PACLING.

[9]  Graham Neubig,et al.  Word-based Partial Annotation for Efficient Corpus Construction , 2010, LREC.

[10]  Hermann Ney,et al.  Improved Statistical Alignment Models , 2000, ACL.

[11]  Lori Lamel,et al.  Automatic Generation of a Pronunciation Dictionary with Rich Variation Coverage Using SMT Methods , 2011, CICLing.

[12]  Keikichi Hirose,et al.  WFST-Based Grapheme-to-Phoneme Conversion: Open Source tools for Alignment, Model-Building and Decoding , 2012, FSMNLP.

[13]  Philipp Koehn,et al.  Moses: Open Source Toolkit for Statistical Machine Translation , 2007, ACL.

[14]  Robert I. Damper,et al.  A comparison of letter-to-sound conversion techniques for English text-to-speech synthesis , 1998 .

[15]  Koby Crammer,et al.  Adaptive regularization of weight vectors , 2009, Machine Learning.

[16]  Wolfgang Macherey,et al.  Lattice-based Minimum Error Rate Training for Statistical Machine Translation , 2008, EMNLP.

[17]  Yoshua Bengio,et al.  Learning Phrase Representations using RNN Encoder–Decoder for Statistical Machine Translation , 2014, EMNLP.

[18]  Marelie H. Davel,et al.  Pronunciation dictionary development in resource-scarce environments , 2009, INTERSPEECH.

[19]  Daniel Marcu,et al.  Statistical Phrase-Based Translation , 2003, NAACL.

[20]  Yuji Matsumoto,et al.  Training Conditional Random Fields Using Incomplete Annotations , 2008, COLING.

[21]  Masafumi Nishimura,et al.  A stochastic approach to phoneme and accent estimation , 2005, INTERSPEECH.

[22]  Tim Schlippe,et al.  Rapid Generation of Pronunciation Dictionaries for new Domains and Languages , 2014 .

[23]  Y. Singer,et al.  Ultraconservative online algorithms for multiclass problems , 2003 .

[24]  Koby Crammer,et al.  Confidence-weighted linear classification , 2008, ICML '08.

[25]  Hermann Ney,et al.  Joint-sequence models for grapheme-to-phoneme conversion , 2008, Speech Commun..

[26]  Andreas Stolcke,et al.  SRILM - an extensible language modeling toolkit , 2002, INTERSPEECH.

[27]  Kenta Oono,et al.  Chainer : a Next-Generation Open Source Framework for Deep Learning , 2015 .

[28]  Eric K. Ringger,et al.  Active Learning for Part-of-Speech Tagging: Accelerating Corpus Annotation , 2007, LAW@ACL.

[29]  Andrew McCallum,et al.  Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data , 2001, ICML.

[30]  Keikichi Hirose,et al.  Improving WFST-based G2P Conversion with Alignment Constraints and RNNLM N-best Rescoring , 2012, INTERSPEECH.