Improving Pronunciation Accuracy of Proper Names with Language Origin Classes

Pronunciation of proper names that have different and varied language sources is an extremely hard task, even for humans. This thesis presents an attempt to improve automatic pronunciation of proper names by modeling the way humans do it, and tries to eliminate synthesis errors that humans would never make. It does so by taking into account the different language and language family sources and by adding such information as features into the pronunciation models, either directly or indirectly. This approach does result in an improvement of pronunciation accuracy, however in order to assess the true goodness of this approach, we would need to develop a more accurate language identifier. Ultimately, the data we would like to have in order to train our models is a list of proper names tagged both with their phonetic transcription and with the language they come from. A new approach this thesis begins to investigate is the unsupervised clustering of proper names to derive language classes in a data-driven way. With this approach, no language classes (Catalan, English, French, German, etc.) need to be determined a priori, but rather they are inferred from the names and their pronunciation. The clustering method used takes into account letter trigrams as well as their aligned pronunciation at training time. Experiments using the classes derived from unsupervised clustering are still preliminary and have not yet yielded an improvement in pronunciation accuracy of proper names.

[1]  Bernd Möbius,et al.  Name pronunciation in German text-to-speech synthesis , 1997, ANLP.

[2]  Hinrich Schütze,et al.  Book Reviews: Foundations of Statistical Natural Language Processing , 1999, CL.

[3]  Rayid Ghani,et al.  Building Minority Language Corpora by Learning to Generate Web Search Queries , 2003, Knowledge and Information Systems.

[4]  George K. Kokkinakis,et al.  A language-independent probabilistic model for automatic conversion between graphemic and phonemic transcription of words , 1999, EUROSPEECH.

[5]  Naftali Tishby,et al.  Distributional Similarity, Phase Transitions and Hierarchical Clustering , 1992 .

[6]  Tony Vitale,et al.  An Algorithm for High Accuracy Name Pronunciation by Parametric Speech Synthesizer , 1991, Comput. Linguistics.

[7]  Joseph Picone,et al.  An advanced system to generate pronunciations of proper nouns , 1997, 1997 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[8]  Kenneth Ward Church,et al.  Morphology and rhyming: two powerful alternatives to letter-to-sound rules for speech synthesis , 1990, SSW.

[9]  Jonathan G. Fiscus,et al.  A post-processing system to yield reduced word error rates: Recognizer Output Voting Error Reduction (ROVER) , 1997, 1997 IEEE Workshop on Automatic Speech Recognition and Understanding Proceedings.

[10]  Alan W. Black,et al.  Letter to sound rules for accented lexicon compression , 1998, ICSLP.

[11]  Robert L. Mercer,et al.  Class-Based n-gram Models of Natural Language , 1992, CL.

[12]  Walter Daelemans,et al.  Language-Independent Data-Oriented Grapheme-to-Phoneme Conversion , 1996 .

[13]  Walter Daelemans,et al.  Modularity in Inductively-Learned Word Pronunciation Systems , 1998, CoNLL.

[14]  Francois Yvon,et al.  Self-Learning Techniques for Grapheme-to-Phoneme Conversion , 1994 .

[15]  Gosse Bouma,et al.  A Finite State and Data-Oriented Method for Grapheme to Phoneme Conversion , 2000, ANLP.

[16]  Avrim Blum,et al.  The Bottleneck , 2021, Monopsony Capitalism.

[17]  Eugene Charniak,et al.  Statistical language learning , 1997 .

[18]  Alan W. Black,et al.  Issues in building general letter to sound rules , 1998, SSW.

[19]  Paul S. Rosenbloom,et al.  A comparison of Anapron with seven other name-pronounciation systems , 2022 .

[20]  Paul Taylor,et al.  The architecture of the Festival speech synthesis system , 1998, SSW.

[21]  W. B. Cavnar,et al.  N-gram-based text categorization , 1994 .

[22]  Terrence J. Sejnowski,et al.  Parallel Networks that Learn to Pronounce English Text , 1987, Complex Syst..

[23]  R. I. Damper,et al.  Stochastic phonographic transduction for English , 1996, Comput. Speech Lang..

[24]  Roger K. Moore Computer Speech and Language , 1986 .

[25]  Walter Daelemans,et al.  Machine learning of word pronunciation: the case against abstraction , 1999, EUROSPEECH.

[26]  Horst-Udo Hain Automation of the training procedures for neural networks performing multi-lingual grapheme to phoneme conversion , 1999, EUROSPEECH.

[27]  Kenneth Church Morphoogicai Decomposition and Stress Assignment for Speech Synthesis , 1986, ACL.

[28]  Yoshinori Sagisaka,et al.  Automatic generation of multiple pronunciations based on neural networks , 1999, Speech Commun..

[29]  Thomas G. Dietterich What is machine learning? , 2020, Archives of Disease in Childhood.

[30]  Xuedong Huang,et al.  Improvements on a trainable letter-to-sound converter , 1997, EUROSPEECH.

[31]  Kevin Knight,et al.  Machine Transliteration , 1997, CL.

[32]  Kenneth Ward Church Stress assignment in letter‐to‐sound rules for speech synthesis , 1985 .

[33]  Shankar Kumar,et al.  Normalization of Non-Standard Words: WS '99 Final Report , 1999 .

[34]  Mitch Weintraub,et al.  Automatic Learning of Word Pronunciation from Data , 1996 .

[35]  Kari Torkkola An efficient way to learn English grapheme-to-phoneme rules automatically , 1993, 1993 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[36]  Leo Breiman,et al.  Classification and Regression Trees , 1984 .

[37]  Sergei Nirenburg,et al.  Integrating Translations from Multiple Sources within the PANGLOSS Mark III Machine Translation System , 1994, AMTA.

[38]  Maria Leonor Pacheco,et al.  of the Association for Computational Linguistics: , 2001 .

[39]  George Kingsley Zipf,et al.  Relative Frequency as a Determinant of Phonetic Change , 1930 .

[40]  Alan W. Black,et al.  Statistically trained orthographic to sound models for Thai , 2000, INTERSPEECH.

[41]  Robert L. Mercer,et al.  An information theoretic approach to the automatic determination of phonemic baseforms , 1984, ICASSP.