Text Normalization and Diphone Preparation for Bangla Speech Synthesis

This paper presents methodologies involved in text normalization and diphone preparation for Bangla Text to Speech (TTS) synthesis. A Concatenation based TTS system comprises  basically two modules-  one is natural language processing and the other is Digital Signal Processing (DSP). Natural language processing deals with converting text to its pronounceable  form, called Text Normalization and the diphone selection method based on the normalized text is called Grapheme to Phoneme (G2P) conversion.  Text normalization  issues addressed in this paper include tokenization, conjuncts, null modified characters, numerical words, abbreviations  and  acronyms.  Issues related with diphone preparation include diphone categorization, corpus preparation,  diphone  labeling and  diphone selection. Appropriate rules and algorithms are proposed to tackle all the above mentioned issues.  We developed a  speech synthesizer for Bangla using diphone based concatenative approach which is demonstrated to produce much natural sounding synthetic speech.