Bangla grapheme to phoneme conversion using conditional random fields

Integrated with handheld devices, toys, KIOSKs, and call centers, Text to Speech (TTS) and Speech Recognition (SR) have become widely used applications in everyday life. One of the core components of said applications is Grapheme to Phoneme (G2P) conversion. The task at hand is the mapping of the written form to the spoken form, i.e. mapping one sequence to another. In Natural Language Processing (NLP), it is typically referred to as a sequence to sequence labeling task. The task however, is a language dependent one and has primarily been implemented for English and similar resource-rich languages. In comparison, very little has been done for digitally under-resourced languages such as Bangla (ethnonym: Bangla; exonym: Bengali). The current state-of-the-art Bangla Grapheme to Phoneme conversion is limited to rule-based and lexicon based approaches, the development of which requires a significant contribution of linguistic experts. In this paper, we propose a data-driven machine learning approach for Bangla G2P conversion. We evaluate the existing rule based approaches and design a machine learning model using Conditional Ran-dom Fields (CRFs). To train the machine learning models we have only used character level contextual features due to the fact that extracting hand crafted features requires specialized knowledge. We have evaluated the systems using two publicly available datasets. We have obtained promising results with a phoneme error rate of 1.51% and 14.88% for CRBLP and Google pronunciation lexicons, respectively.

[1]  Geoffrey Zweig,et al.  Sequence-to-sequence neural net models for grapheme-to-phoneme conversion , 2015, INTERSPEECH.

[2]  Firoj Alam Bangla Text to Speech using Festival , 2011 .

[3]  Terrence J. Sejnowski,et al.  NETtalk: a parallel network that learns to read aloud , 1988 .

[4]  M. S. Hunnicutt,et al.  Phonological Rules For A Text To Speech Sytem , 1979, ACL Microfiche Series 1-83, Including Computational Linguistics.

[5]  Stanley F. Chen,et al.  Conditional and joint models for grapheme-to-phoneme conversion , 2003, INTERSPEECH.

[6]  Eileen Fitzpatrick,et al.  A Computational Grammar of Discourse-Neutral Prosodic Phrasing in English , 1990, Comput. Linguistics.

[7]  Firoj Alam,et al.  Acoustic analysis of Bangla vowel inventory , 2008 .

[8]  James H. Martin,et al.  Speech and language processing: an introduction to natural language processing, computational linguistics, and speech recognition, 2nd Edition , 2000, Prentice Hall series in artificial intelligence.

[9]  W. Ainsworth A system for converting english text into speech , 1973 .

[10]  Andrew McCallum,et al.  Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data , 2001, ICML.

[11]  Alan W. Black,et al.  Letter to sound rules for accented lexicon compression , 1998, ICSLP.

[12]  Tulika Basu,et al.  Grapheme to Phoneme (G2P) conversion for Bangla , 2009, 2009 Oriental COCOSDA International Conference on Speech Database and Assessments.

[13]  Rodney W. Johnson,et al.  Letter-to-sound rules for automatic translation of english text to phonetics , 1976 .

[14]  Paul Taylor,et al.  Text-to-Speech Synthesis , 2009 .

[15]  Terrence J. Sejnowski,et al.  Parallel Networks that Learn to Pronounce English Text , 1987, Complex Syst..

[16]  Hermann Ney,et al.  Joint-sequence models for grapheme-to-phoneme conversion , 2008, Speech Commun..

[17]  Yoshinori Sagisaka,et al.  Comparison of Grapheme-to-Phoneme Conversion Methods on a Myanmar Pronunciation Dictionary , 2016, WSSANLP@COLING.

[18]  Andrew McCallum,et al.  An Introduction to Conditional Random Fields , 2010, Found. Trends Mach. Learn..

[19]  Kevin Knight,et al.  Grapheme-to-Phoneme Models for (Almost) Any Language , 2016, ACL.

[20]  Richard Sproat,et al.  TTS for Low Resource Languages: A Bangla Synthesizer , 2016, LREC.

[21]  Young-Bum Kim,et al.  Universal Grapheme-to-Phoneme Prediction Over Latin Alphabets , 2012, EMNLP.

[22]  Vladimir I. Levenshtein,et al.  Binary codes capable of correcting deletions, insertions, and reversals , 1965 .