Low-Resource Machine Transliteration Using Recurrent Neural Networks of Asian Languages

Grapheme-to-phoneme models are key components in automatic speech recognition and text-to-speech systems. With low-resource language pairs that do not have available and well-developed pronunciation lexicons, grapheme-to-phoneme models are particularly useful. These models are based on initial alignments between grapheme source and phoneme target sequences. Inspired by sequence-to-sequence recurrent neural network--based translation methods, the current research presents an approach that applies an alignment representation for input sequences and pretrained source and target embeddings to overcome the transliteration problem for a low-resource languages pair. Evaluation and experiments involving French and Vietnamese showed that with only a small bilingual pronunciation dictionary available for training the transliteration models, promising results were obtained with a large increase in BLEU scores and a reduction in Translation Error Rate (TER) and Phoneme Error Rate (PER). Moreover, we compared our proposed neural network--based transliteration approach with a statistical one.

[1]  Kevin Knight,et al.  Machine Transliteration , 1997, CL.

[2]  Hermann Ney,et al.  Joint-sequence models for grapheme-to-phoneme conversion , 2008, Speech Commun..

[3]  Eiichiro Sumita,et al.  Transliteration Using a Phrase-Based Statistical Machine Translation System to Re-Score the Output of a Joint Multigram Model , 2010, NEWS@ACL.

[4]  NeyHermann,et al.  Joint-sequence models for grapheme-to-phoneme conversion , 2008 .

[5]  Joakim Nivre,et al.  Applying Neural Networks to English-Chinese Named Entity Transliteration , 2016, NEWS@ACM.

[6]  Hassan Sajjad,et al.  Statistical Models for Unsupervised, Semi-Supervised Supervised Transliteration Mining , 2017, CL.

[7]  Yoshinori Sagisaka,et al.  Comparison of Grapheme-to-Phoneme Conversion Methods on a Myanmar Pronunciation Dictionary , 2016, WSSANLP@COLING.

[8]  Haizhou Li,et al.  Report of NEWS 2010 Transliteration Mining Shared Task , 2010, NEWS@ACL.

[9]  Geoffrey Zweig,et al.  Linguistic Regularities in Continuous Space Word Representations , 2013, NAACL.

[10]  Andreas Stolcke,et al.  SRILM - an extensible language modeling toolkit , 2002, INTERSPEECH.

[11]  Bin Ma,et al.  Phonology-augmented statistical transliteration for low-resource languages , 2015, INTERSPEECH.

[12]  Hien T. Nguyen,et al.  A Character Level Based and Word Level Based Approach for Chinese-Vietnamese Machine Translation , 2016, Comput. Intell. Neurosci..

[13]  Quoc V. Le,et al.  Addressing the Rare Word Problem in Neural Machine Translation , 2014, ACL.

[14]  George Kurian,et al.  Google's Neural Machine Translation System: Bridging the Gap between Human and Machine Translation , 2016, ArXiv.

[15]  Frédéric Bimbot,et al.  Variable-length sequence matching for phonetic transcription using joint multigrams , 1995, EUROSPEECH.

[16]  Quoc V. Le,et al.  Sequence to Sequence Learning with Neural Networks , 2014, NIPS.

[17]  Geoffrey Zweig,et al.  Sequence-to-sequence neural net models for grapheme-to-phoneme conversion , 2015, INTERSPEECH.

[18]  Fuchun Peng,et al.  Grapheme-to-phoneme conversion using Long Short-Term Memory recurrent neural networks , 2015, 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[19]  Paul Deléglise,et al.  Grapheme to phoneme conversion using an SMT system , 2009, INTERSPEECH.

[20]  Hitoshi Isahara,et al.  A machine transliteration model based on correspondence between graphemes and phonemes , 2006, TALIP.

[21]  John Salvatier,et al.  Theano: A Python framework for fast computation of mathematical expressions , 2016, ArXiv.

[22]  Yoshua Bengio,et al.  On Using Very Large Target Vocabulary for Neural Machine Translation , 2014, ACL.

[23]  Stephan Vogel,et al.  Parallel Implementations of Word Alignment Tool , 2008, SETQALNLP.

[24]  Jörg Tiedemann,et al.  Neural machine translation for low-resource languages , 2017, ArXiv.

[25]  Yaser Al-Onaizan,et al.  Zero-Resource Translation with Multi-Lingual Neural Machine Translation , 2016, EMNLP.

[26]  Deniz Yuret,et al.  Transfer Learning for Low-Resource Neural Machine Translation , 2016, EMNLP.

[27]  Yoshua Bengio,et al.  Neural Machine Translation by Jointly Learning to Align and Translate , 2014, ICLR.

[28]  Stanley F. Chen,et al.  Conditional and joint models for grapheme-to-phoneme conversion , 2003, INTERSPEECH.

[29]  Philipp Koehn,et al.  Six Challenges for Neural Machine Translation , 2017, NMT@ACL.

[30]  K. Saravanan,et al.  MINT: A Method for Effective and Scalable Mining of Named Entity Transliterations from Large Comparable Corpora , 2009, EACL.

[31]  Philipp Koehn,et al.  Neural Machine Translation , 2017, ArXiv.

[32]  Lemao Liu,et al.  Target-Bidirectional Neural Models for Machine Transliteration , 2016, NEWS@ACM.

[33]  Sravana Reddy,et al.  G2P Conversion of Proper Names Using Word Origin Information , 2012, HLT-NAACL.

[34]  Nhut M. Pham,et al.  Comparative analysis of transliteration techniques based on statistical machine translation and joint-sequence model , 2010, SoICT.

[35]  Salim Roukos,et al.  Bleu: a Method for Automatic Evaluation of Machine Translation , 2002, ACL.

[36]  Nitin Madnani,et al.  TER-Plus: paraphrase, semantic, and alignment enhancements to Translation Edit Rate , 2009, Machine Translation.

[37]  Thomas Breuel,et al.  Sequence-to-sequence neural network models for transliteration , 2016, ArXiv.

[38]  Haizhou Li,et al.  Report of NEWS 2016 Machine Transliteration Shared Task , 2016, NEWS@ACM.

[39]  Philipp Koehn,et al.  Moses: Open Source Toolkit for Statistical Machine Translation , 2007, ACL.

[40]  Lei Yao,et al.  Multiple System Combination for Transliteration , 2015, NEWS@ACL.

[41]  Grzegorz Kondrak,et al.  Applying Many-to-Many Alignments and Hidden Markov Models to Letter-to-Phoneme Conversion , 2007, NAACL.

[42]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[43]  Alon Lavie,et al.  Better Hypothesis Testing for Statistical Machine Translation: Controlling for Optimizer Instability , 2011, ACL.

[44]  Falk Scholer,et al.  Machine transliteration survey , 2011, ACM Comput. Surv..