Machine learning algorithm for automatic labeling and its application in text-to-speech conversion

In this paper we present a novel machine learning approach usable for text labeling problems. We illustrate the importance of the problem for Text-to-Speech systems and through that for telecommunication applications. We introduce the proposed method, and demonstrate its effectiveness on the problem of language identification, using three different training sets and large test corpora. Reviewed

[1]  Richard Sproat,et al.  Compilation of Weighted Finite-State Transducers from Decision Trees , 1996, ACL.

[2]  M. Fek,et al.  Language processing for name and address reading in Hungarian , 2003, International Conference on Natural Language Processing and Knowledge Engineering, 2003. Proceedings. 2003.

[3]  Harald Romsdorfer,et al.  Mixed-lingual text analysis for polyglot TTS synthesis , 2003, INTERSPEECH.

[4]  András Kornai,et al.  Creating Open Language Resources for Hungarian , 2004, LREC.

[5]  Claire Waast-Richard,et al.  A transformation-based learning approach to language identification for mixed-lingual text-to-speech synthesis , 2005, INTERSPEECH.

[6]  Géza Németh,et al.  The Design, Implementation, and Operation of a Hungarian E-Mail Reader , 2000, Int. J. Speech Technol..

[7]  Géza Németh,et al.  Multilingual statistical text analysis, Zipf's law and Hungarian speech generation , 2002 .

[8]  H. Ney,et al.  Synther - a new m-gram POS tagger , 2003, International Conference on Natural Language Processing and Knowledge Engineering, 2003. Proceedings. 2003.

[9]  티안 질레이,et al.  Scalable neural network-based language identification from written text , 2003 .

[10]  Jilei Tian,et al.  n-gram and decision tree based language identification for written words , 2001, IEEE Workshop on Automatic Speech Recognition and Understanding, 2001. ASRU '01..

[11]  John M. Prager,et al.  Linguini: language identification for multilingual documents , 1999, Proceedings of the 32nd Annual Hawaii International Conference on Systems Sciences. 1999. HICSS-32. Abstracts and CD-ROM of Full Papers.