This paper presents methodologies involved in text normalization and diphone preparation for Bangla Text to Speech (TTS) synthesis. A Concatenation based TTS system comprises basically two modules- one is natural language processing and the other is Digital Signal Processing (DSP). Natural language processing deals with converting text to its pronounceable form, called Text Normalization and the diphone selection method based on the normalized text is called Grapheme to Phoneme (G2P) conversion. Text normalization issues addressed in this paper include tokenization, conjuncts, null modified characters, numerical words, abbreviations and acronyms. Issues related with diphone preparation include diphone categorization, corpus preparation, diphone labeling and diphone selection. Appropriate rules and algorithms are proposed to tackle all the above mentioned issues. We developed a speech synthesizer for Bangla using diphone based concatenative approach which is demonstrated to produce much natural sounding synthetic speech.
[1]
Firoj Alam,et al.
Text normalization system for Bangla
,
2008
.
[2]
Alfred V. Aho,et al.
Efficient string matching
,
1975,
Commun. ACM.
[3]
R. Muralishankar,et al.
A complete text-to-speech synthesis system in Tamil
,
2002,
Proceedings of 2002 IEEE Workshop on Speech Synthesis, 2002..
[4]
Firoj Alam,et al.
Text to speech for Bangla language using festival
,
2007
.
[5]
M. Shahidur Rahman,et al.
Diphone preparation for Bangla text to speech synthesis
,
2009,
2009 12th International Conference on Computers and Information Technology.
[6]
Marc C. Beutnagel,et al.
The AT & T NEXT-GEN TTS system
,
1999
.
[7]
Wing-Kai Hon,et al.
Dynamic dictionary matching and compressed suffix trees
,
2005,
SODA '05.
[8]
Asoke Kumar Datta,et al.
Epoch synchronous non-overlap-add (ESNOLA) method-based concatenative speech synthesis system for Bangla
,
2007,
SSW.