Grapheme-to-Phoneme Conversion Based on a Fast TBL Algorithm in Mandarin TTS Systems

Grapheme-to-phoneme (G2P) conversion is an important subcomponent in many speech processing systems. The difficulty in Chinese G2P conversion is to pick out one correct pronunciation from several candidates according to the context information such as part-of-speech, lexical words, length of the word, or position of the polyphone in a word or a sentence. By evaluating the distribution of polyphones in a large text corpus with correct pinyin transcriptions, this paper points out that correct G2P conversion for 78 key polyphones greatly decrease the overall error rate. This paper proposed a fast Transformation-based error-driven learning (TBL) algorithm to solve G2P conversion. The correct rates of polyphones, which originally have high accuracy or low accuracy, are both improved. After compared with Decision Tree algorithm, TBL algorithm shows better performance to solve the polyphone problem.

[1]  Terrence J. Sejnowski,et al.  Parallel Networks that Learn to Pronounce English Text , 1987, Complex Syst..

[2]  Eric Brill,et al.  Transformation-Based Error-Driven Learning and Natural Language Processing: A Case Study in Part-of-Speech Tagging , 1995, CL.

[3]  David M. Magerman Statistical Decision-Tree Models for Parsing , 1995, ACL.

[4]  Elmar Nöth,et al.  Comparison of two tree-structured approaches for grapheme-to-phoneme conversion , 1996, Proceeding of Fourth International Conference on Spoken Language Processing. ICSLP '96.

[5]  Sin-Horng Chen,et al.  The broad study of homograph disambiguity for Mandarin speech synthesis , 1996, Proceeding of Fourth International Conference on Spoken Language Processing. ICSLP '96.

[6]  François Yvon Grapheme-to-Phoneme Conversion using Multiple Unbounded Overlapping Chunks , 1996, ArXiv.

[7]  Anthony J. Vitale,et al.  Algorithms for Grapheme-Phoneme Translation for English and French: Applications for Database Searches and Speech Synthesis , 1997, CL.

[8]  Alexander S. Yeh,et al.  Some Properties of Preposition and Subordinate Conjunction Attachments , 1998, COLING-ACL.

[9]  Alexander S. Yeh,et al.  Learning Transformation Rules to Find Grammatical Relations , 1999, CoNLL.

[10]  Gary Geunbae Lee,et al.  Decision-Tree based Error Correction for Statistical Phrase Break Prediction in Korean , 2000, COLING.

[11]  Wei Zhang,et al.  Automatic prosody labeling using both text and acoustic information , 2003, 2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP '03)..

[12]  Etienne Barnard,et al.  Default-and-refinement approach to pronunciation prediction , 2004 .