论文信息 - Improving Pronunciation Accuracy of Proper Names with Language Origin Classes

Improving Pronunciation Accuracy of Proper Names with Language Origin Classes

Pronunciation of proper names that have different and varied language sources is an extremely hard task, even for humans. This thesis presents an attempt to improve automatic pronunciation of proper names by modeling the way humans do it, and tries to eliminate synthesis errors that humans would never make. It does so by taking into account the different language and language family sources and by adding such information as features into the pronunciation models, either directly or indirectly. This approach does result in an improvement of pronunciation accuracy, however in order to assess the true goodness of this approach, we would need to develop a more accurate language identifier. Ultimately, the data we would like to have in order to train our models is a list of proper names tagged both with their phonetic transcription and with the language they come from. A new approach this thesis begins to investigate is the unsupervised clustering of proper names to derive language classes in a data-driven way. With this approach, no language classes (Catalan, English, French, German, etc.) need to be determined a priori, but rather they are inferred from the names and their pronunciation. The clustering method used takes into account letter trigrams as well as their aligned pronunciation at training time. Experiments using the classes derived from unsupervised clustering are still preliminary and have not yet yielded an improvement in pronunciation accuracy of proper names.

Ariadna Font Llitjós | A. F. Llitjós

[1] Bernd Möbius,et al. Name pronunciation in German text-to-speech synthesis , 1997, ANLP.

[2] Hinrich Schütze,et al. Book Reviews: Foundations of Statistical Natural Language Processing , 1999, CL.

[3] Rayid Ghani,et al. Building Minority Language Corpora by Learning to Generate Web Search Queries , 2003, Knowledge and Information Systems.

[4] George K. Kokkinakis,et al. A language-independent probabilistic model for automatic conversion between graphemic and phonemic transcription of words , 1999, EUROSPEECH.

[5] Naftali Tishby,et al. Distributional Similarity, Phase Transitions and Hierarchical Clustering , 1992 .

[6] Tony Vitale,et al. An Algorithm for High Accuracy Name Pronunciation by Parametric Speech Synthesizer , 1991, Comput. Linguistics.

[7] Joseph Picone,et al. An advanced system to generate pronunciations of proper nouns , 1997, 1997 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[8] Kenneth Ward Church,et al. Morphology and rhyming: two powerful alternatives to letter-to-sound rules for speech synthesis , 1990, SSW.

[9] Jonathan G. Fiscus,et al. A post-processing system to yield reduced word error rates: Recognizer Output Voting Error Reduction (ROVER) , 1997, 1997 IEEE Workshop on Automatic Speech Recognition and Understanding Proceedings.

[10] Alan W. Black,et al. Letter to sound rules for accented lexicon compression , 1998, ICSLP.

[11] Robert L. Mercer,et al. Class-Based n-gram Models of Natural Language , 1992, CL.