A transformation-based learning approach to language identification for mixed-lingual text-to-speech synthesis

Recent progress in corpus-based concatenative text-to-speech synthesis has generated some interest in systems that are capable of synthesizing text from more than one language. In this paper we describe the language identification component of such a mixed-lingual text-to-speech system. Relying only on the input text, we employ two different methods, namely a transformation based learning approach and a stochastic n-gram approach, and we describe the combination of both methods. While the transformation-based learning approach already produces average error rates of less than 2 percent and outperforms the n-gram classification scheme, the combination of both methods results in a further error reduction of up to 50 percent.

[1]  Simon Corston-Oliver Combining Decision Trees And Transformation-Based Learning To Correct Transferred Linguistic Representations , 2003 .

[2]  Haiping Li,et al.  Trainable Cantonese/English dual language speech synthesis system , 2003, 2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP '03)..

[3]  Ossama Emam,et al.  Multilingual acoustic models for speech recognition and synthesis , 2004, 2004 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[4]  Beat Pfister,et al.  From multilingual to polyglot speech synthesis , 1999, EUROSPEECH.

[5]  Richard Sproat Multilingual Text-to-Speech Synthesis , 1997 .

[6]  Jilei Tian,et al.  Scalable neural network based language identification from written text , 2003, 2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP '03)..

[7]  티안 질레이,et al.  Scalable neural network-based language identification from written text , 2003 .

[8]  Michael Picheny,et al.  The IBM expressive speech synthesis system , 2004, INTERSPEECH.

[9]  Richard Sproat,et al.  Multilingual Text-to-Speech Synthesis: The Bell Labs Approach , 1998, CL.

[10]  Mitchell P. Marcus,et al.  Text Chunking using Transformation-Based Learning , 1995, VLC@ACL.

[11]  John M. Prager,et al.  Linguini: language identification for multilingual documents , 1999, Proceedings of the 32nd Annual Hawaii International Conference on Systems Sciences. 1999. HICSS-32. Abstracts and CD-ROM of Full Papers.

[12]  Marc A. Zissman,et al.  Comparison of : Four Approaches to Automatic Language Identification of Telephone Speech , 2004 .

[13]  Mahesh Viswanathan,et al.  Recent improvements to the IBM trainable speech synthesis system , 2003, 2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP '03)..

[14]  Eric Brill,et al.  Transformation-Based Error-Driven Learning and Natural Language Processing: A Case Study in Part-of-Speech Tagging , 1995, CL.

[15]  Jilei Tian,et al.  On text-based language identification for multilingual speech recognition systems , 2002, INTERSPEECH.

[16]  Eric Brill,et al.  A Simple Rule-Based Part of Speech Tagger , 1992, HLT.