论文信息 - Comparison of Grapheme-to-Phoneme Conversion Methods on a Myanmar Pronunciation Dictionary - 字舞流文

Comparison of Grapheme-to-Phoneme Conversion Methods on a Myanmar Pronunciation Dictionary

Grapheme-to-Phoneme (G2P) conversion is the task of predicting the pronunciation of a word given its graphemic or written form. It is a highly important part of both automatic speech recognition (ASR) and text-to-speech (TTS) systems. In this paper, we evaluate seven G2P conversion approaches: Adaptive Regularization of Weight Vectors (AROW) based structured learning (S-AROW), Conditional Random Field (CRF), Joint-sequence models (JSM), phrase-based statistical machine translation (PBSMT), Recurrent Neural Network (RNN), Support Vector Machine (SVM) based point-wise classification, Weighted Finite-state Transducers (WFST) on a manually tagged Myanmar phoneme dictionary. The G2P bootstrapping experimental results were measured with both automatic phoneme error rate (PER) calculation and also manual checking in terms of voiced/unvoiced, tones, consonant and vowel errors. The result shows that CRF, PBSMT and WFST approaches are the best performing methods for G2P conversion on Myanmar language.

Yoshinori Sagisaka | Naoto Iwahashi | Ye Kyaw Thu | Win Pa Pa | N. Iwahashi | Y. Sagisaka

[1] Christoph Tillmann,et al. A Unigram Orientation Model for Statistical Machine Translation , 2004, NAACL.

[2] Jürgen Schmidhuber,et al. Long Short-Term Memory , 1997, Neural Computation.

[3] Yoshua Bengio,et al. Neural Machine Translation by Jointly Learning to Align and Translate , 2014, ICLR.

[4] F ChenStanley,et al. An Empirical Study of Smoothing Techniques for Language Modeling , 1996, ACL.

[5] Paul Deléglise,et al. Grapheme to phoneme conversion using an SMT system , 2009, INTERSPEECH.

[6] Franz Josef Och,et al. Minimum Error Rate Training in Statistical Machine Translation , 2003, ACL.

[7] Tomoki Toda,et al. Structured Adaptive Regularization of Weight Vectors for a Robust Grapheme-to-Phoneme Conversion Model , 2014, IEICE Trans. Inf. Syst..

[8] Eiichiro Sumita,et al. The Application of Phrase Based Statistical Machine Translation Techniques to Myanmar Grapheme to Phoneme Conversion , 2015, PACLING.

[9] Graham Neubig,et al. Word-based Partial Annotation for Efficient Corpus Construction , 2010, LREC.

[10] Hermann Ney,et al. Improved Statistical Alignment Models , 2000, ACL.

[11] Lori Lamel,et al. Automatic Generation of a Pronunciation Dictionary with Rich Variation Coverage Using SMT Methods , 2011, CICLing.

[12] Keikichi Hirose,et al. WFST-Based Grapheme-to-Phoneme Conversion: Open Source tools for Alignment, Model-Building and Decoding , 2012, FSMNLP.

[13] Philipp Koehn,et al. Moses: Open Source Toolkit for Statistical Machine Translation , 2007, ACL.

[14] Robert I. Damper,et al. A comparison of letter-to-sound conversion techniques for English text-to-speech synthesis , 1998 .

[15] Koby Crammer,et al. Adaptive regularization of weight vectors , 2009, Machine Learning.

[16] Wolfgang Macherey,et al. Lattice-based Minimum Error Rate Training for Statistical Machine Translation , 2008, EMNLP.

[17] Yoshua Bengio,et al. Learning Phrase Representations using RNN Encoder–Decoder for Statistical Machine Translation , 2014, EMNLP.

[18] Marelie H. Davel,et al. Pronunciation dictionary development in resource-scarce environments , 2009, INTERSPEECH.

[19] Daniel Marcu,et al. Statistical Phrase-Based Translation , 2003, NAACL.

[20] Yuji Matsumoto,et al. Training Conditional Random Fields Using Incomplete Annotations , 2008, COLING.

[21] Masafumi Nishimura,et al. A stochastic approach to phoneme and accent estimation , 2005, INTERSPEECH.

[22] Tim Schlippe,et al. Rapid Generation of Pronunciation Dictionaries for new Domains and Languages , 2014 .

[23] Y. Singer,et al. Ultraconservative online algorithms for multiclass problems , 2003 .

[24] Koby Crammer,et al. Confidence-weighted linear classification , 2008, ICML '08.

[25] Hermann Ney,et al. Joint-sequence models for grapheme-to-phoneme conversion , 2008, Speech Commun..

[26] Andreas Stolcke,et al. SRILM - an extensible language modeling toolkit , 2002, INTERSPEECH.

[27] Kenta Oono,et al. Chainer : a Next-Generation Open Source Framework for Deep Learning , 2015 .

[28] Eric K. Ringger,et al. Active Learning for Part-of-Speech Tagging: Accelerating Corpus Annotation , 2007, LAW@ACL.

[29] Andrew McCallum,et al. Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data , 2001, ICML.

[30] Keikichi Hirose,et al. Improving WFST-based G2P Conversion with Alignment Constraints and RNNLM N-best Rescoring , 2012, INTERSPEECH.