Incorporating syllabification points into a model of grapheme-to-phoneme conversion

A model to convert a grapheme into a phoneme (G2P) is crucial in the natural language processing area. In general, it is developed using a probabilistic-based data-driven approach and directly applied to a sequence of graphemes with no other information. Important research shows that incorporating information of syllabification point is capable of improving a probabilistic-based English G2P. However, the information should be accurately provided by a perfect orthographic syllabification. Some noises or errors of syllabification significantly reduce the G2P performance. In this paper, incorporation of syllabification points into a probabilistic-based G2P model for Bahasa Indonesia is investigated. This information is important since Bahasa Indonesia is richer than English in terms of syllables. A 5-fold cross-validating on 50 k words shows that the incorporation of syllabification points significantly improves the performance of G2P model, where the phoneme error rate (PER) can be relatively reduced by 10.75%. This PER is much lower than the G2P model based on an inductive learning algorithm. An important contribution of this research is that the proposed G2P model is quite robust to syllabification errors. A syllable error rate (SER) of 2.5% that comes from an orthographic syllabification model just slightly increases the PER of the proposed G2P model from 0.83% to be 0.90%. A higher SER up to 10% just increase the PER to be 1.14%.

[1]  Yannick Marchand,et al.  Automatic Syllabification in English: A Comparison of Different Algorithms , 2009, Language and speech.

[2]  Kevin Knight,et al.  Grapheme-to-Phoneme Models for (Almost) Any Language , 2016, ACL.

[3]  Björn W. Schuller,et al.  Deep Bidirectional Long Short-Term Memory Recurrent Neural Networks for Grapheme-to-Phoneme Conversion Utilizing Complex Many-to-Many Alignments , 2016, INTERSPEECH.

[4]  Saleh M. Abu-Soud ILATalk: a new multilingual text-to-speech synthesizer with machine learning , 2016, Int. J. Speech Technol..

[5]  Josef van Genabith,et al.  Massively Multilingual Neural Grapheme-to-Phoneme Conversion , 2017, ArXiv.

[6]  Agus Harjoko,et al.  Nearest Neighbor-Based Indonesian G2P Conversion , 2014 .

[7]  Joachim Köhler,et al.  Multitask Sequence-to-Sequence Models for Grapheme-to-Phoneme Conversion , 2017, INTERSPEECH.

[8]  Mark Hasegawa-Johnson,et al.  Low-resource grapheme-to-phoneme conversion using recurrent neural networks , 2017, 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[9]  R. A. Sharman,et al.  A bi-directional model of English pronunciation , 1991, EUROSPEECH.

[10]  Agus Harjoko,et al.  Indonesian syllabification using a pseudo nearest neighbour rule and phonotactic knowledge , 2016, Speech Commun..

[11]  Lukás Burget,et al.  Bayesian joint-sequence models for grapheme-to-phoneme conversion , 2017, 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[12]  Marzieh Razavi,et al.  Acoustic data-driven grapheme-to-phoneme conversion in the probabilistic lexical modeling framework , 2016, Speech Commun..

[13]  Agus Harjoko,et al.  Modified Grapheme Encoding and Phonemic Rule to Improve PNNR-Based Indonesian G2P , 2016 .

[14]  Simon King,et al.  Letter-to-Sound Pronunciation Prediction Using Conditional Random Fields , 2011, IEEE Signal Processing Letters.

[15]  Suyanto Suyanto,et al.  Indonesian graphemic syllabification using a nearest neighbour classifier and recovery procedure , 2019, Int. J. Speech Technol..

[16]  Satoshi Nakamura,et al.  Unsupervised Joint Estimation of Grapheme-to-Phoneme Conversion Systems and Acoustic Model Adaptation for Non-Native Speech Recognition , 2016, INTERSPEECH.

[17]  Geoffrey Zweig,et al.  Sequence-to-sequence neural net models for grapheme-to-phoneme conversion , 2015, INTERSPEECH.

[18]  Fuchun Peng,et al.  Grapheme-to-phoneme conversion using Long Short-Term Memory recurrent neural networks , 2015, 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[19]  Walter Daelemans,et al.  Data-Oriented Methods for Grapheme-to-Phoneme Conversion , 1993, EACL.

[20]  Paul Dalsgaard,et al.  Multi-lingual testing of a self-learning approach to phonemic transcription of orthography , 1995, EUROSPEECH.

[21]  Robert I. Damper,et al.  Can syllabification improve pronunciation by analogy of English? , 2006, Natural Language Engineering.