Grapheme-to-Phoneme Conversion in the Era of Globalization

This thesis focuses on the phonetic transcription in the framework of text-to-speech conversion, especially on improving adaptability, reliability and multilingual support in the phonetic module. The language is constantly evolving making the adaptability one of major concerns in phonetic transcription. The phonetic transcription has been addressed from a data- based approach. On one hand, several classifiers such as Decision Trees, Finite State Transducers, Hidden Markov Models were studied and applied to the grapheme-to-phoneme conversion task. In addition, we analyzed a method of generation of pronunciation by analogy, considering different strategies. Further improvements were obtained by means of application of the transformation-based error-driven learning algorithm. The most significant improvements were obtained for classifiers with higher error rates. The experimental results show that the adaptability of phonetic module was improved, having obtained word error rates as low as 12% (for English). Next, steps were taken towards increasing reliability of the output of the phonetic module. Although, the G2P results were quite good, in order to achieve a higher level of reliability we propose using dictionary fusion. The ways the pronunciations are represented in different lexica depend on many factors such as: expert?s opinion, local accent specifications, phonetic alphabet chosen, assimilation level (for proper names), etc. There are often discrepancies between pronunciations of the same word found in different lexica. The fusion system is a system that learns phoneme-to-phoneme transformations and converts pronunciations from the source lexicon into pronunciations from the target lexicon. Another important part of this thesis consisted in acing the challenge of multilingualism, a phenomenon that is becoming a usual part of our daily lives. Our goal was to obtain such pronunciations for foreign inclusions that would not be totally unfamiliar either to a native or proficient speakers of the language to be adapted, or to speakers of this language with average to low proficiency. Nativization by analogy was applied to both orthographic and phonetic forms of the word. The results obtained show that phonetic analogy gives better performance than analogy in the orthographic domain for both proper names and common nouns. Both objective and perceptual results obtained show the validity of this proposal.

[1]  Antonio Bonafonte,et al.  Main Issues in Grapheme-to-Phoneme Conversion for TTS , 2005, Proces. del Leng. Natural.

[2]  K. Palaniappan Concurrent Programming is Hard ! , 1980 .

[3]  Antonio Bonafonte,et al.  Ogmios: The UPC Text-to-Speech synthesis system for Spoken Translation , 2006 .

[4]  Hermann Ney,et al.  Investigations on joint-multigram models for grapheme-to-phoneme conversion , 2002, INTERSPEECH.

[5]  A. María.,et al.  Así se habla, nociones fundamentales de fonética general y española: apuntes de catalán, gallego y euskara , 2005 .

[6]  Jerome R. Bellegarda Unsupervised, language-independent grapheme-to-phoneme conversion by latent analogy , 2005, Speech Commun..

[7]  Isabel Trancoso,et al.  Grapheme-to-phone using finite-state transducers , 2002, Proceedings of 2002 IEEE Workshop on Speech Synthesis, 2002..

[8]  E. Thorndike The Teacher's Word Book , 2007 .

[9]  Isabel Trancoso,et al.  Issues in the pronunciation of proper names: the experience of the Onomastica project , 2001 .

[10]  Walter Daelemans Language � Independent Data � Oriented Grapheme , 2016 .

[11]  W. Ainsworth A system for converting english text into speech , 1973 .

[12]  Heiga Zen,et al.  The HMM-based speech synthesis system (HTS) version 2.0 , 2007, SSW.

[13]  Steven Weisler,et al.  THEORY OF LANGUAGE , 2000 .

[14]  Kari Torkkola An efficient way to learn English grapheme-to-phoneme rules automatically , 1993, 1993 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[15]  Walter Daelemans,et al.  IGTree: Using Trees for Compression and Classification in Lazy Learning Algorithms , 1997, Artificial Intelligence Review.

[16]  Alan W. Black,et al.  Letter to sound rules for accented lexicon compression , 1998, ICSLP.

[17]  J. Wells Accents of English I: An Introduction , 1982 .

[18]  Horacio Rodríguez Hontoria,et al.  Proyecto ALIADO : tecnologías del habla y el lenguaje para un asistente personal , 2003 .

[19]  Juha Häkkinen,et al.  Assessing text-to-phoneme mapping strategies in speaker independent isolated word recognition , 2003, Speech Commun..

[20]  Terrence J. Sejnowski,et al.  Parallel Networks that Learn to Pronounce English Text , 1987, Complex Syst..

[21]  Paul Taylor,et al.  Festival Speech Synthesis System , 1998 .

[22]  Hermann Ney,et al.  Joint-sequence models for grapheme-to-phoneme conversion , 2008, Speech Commun..

[23]  Paul C. Bagshaw Phonemic transcription by analogy in text-to-speech synthesis: Novel word pronunciation and lexicon compression , 1998, Comput. Speech Lang..

[24]  Xuedong Huang,et al.  Improvements on a trainable letter-to-sound converter , 1997, EUROSPEECH.

[25]  Daniel Erro,et al.  The UPC TTS System Description for the 2007 Blizzard Challenge , 2007 .

[26]  Alan W. Black,et al.  Foreign accents in synthetic speech: development and evaluation , 2005, INTERSPEECH.

[27]  M. S. Hunnicutt,et al.  Phonological Rules For A Text To Speech Sytem , 1979, ACL Microfiche Series 1-83, Including Computational Linguistics.

[28]  Antonio Bonafonte,et al.  Introducing nativization to Spanish TTS systems , 2011, Speech Commun..

[29]  Frederick Jelinek,et al.  Statistical methods for speech recognition , 1997 .

[30]  J. Nichols,et al.  Does phoneme inventory size correlate with population size? , 2011 .

[31]  Robert I. Damper,et al.  Aligning letters and phonemes for speech synthesis , 2004, SSW.

[32]  Robert I. Damper,et al.  Pronouncing Text by Analogy , 1996, COLING.

[33]  Leo Breiman,et al.  Classification and Regression Trees , 1984 .

[34]  M. S. Whitley Spanish/English contrasts : a course in Spanish linguistics , 1987 .

[35]  Robert E. Schapire,et al.  Theoretical Views of Boosting , 1999, EuroCOLT.

[36]  Robert I. Damper,et al.  Multilingual pronunciation by analogy , 2008, Natural Language Engineering.

[37]  Anj Foley,et al.  Learner English: A Teacher's Guide to Interference and Other Problems Second Edition [Book Review] , 2002 .

[38]  Katie McGrath,et al.  Language Identification and Language Specific Letter-to-Sound Rules , 2004 .

[39]  Grzegorz Kondrak,et al.  Online discriminative training for grapheme-to-phoneme conversion , 2009, INTERSPEECH.

[41]  J. Weijer,et al.  Word length, sentence length and frequency: Zipf revisited , 2004 .

[42]  Alan W. Black,et al.  Multilingual text-to-speech synthesis , 2004, 2004 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[43]  刘江雪,et al.  LIN volume 11 issue 2 Cover and Back matter , 1975, Journal of Linguistics.

[44]  Harald Romsdorfer,et al.  Mixed-lingual text analysis for polyglot TTS synthesis , 2003, INTERSPEECH.

[45]  Lori A. Helman,et al.  Orthographic development and learning to read in different languages , 2003 .

[46]  Peter Ladefoged,et al.  Vowels and Consonants , 2000, Manchu Grammar.

[47]  Ariadna Font Llitjós,et al.  Improving Pronunciation Accuracy of Proper Names with Language Origin Classes , 2001 .

[48]  José B. Mariño AVIVAVOZ : TECNOLOGÍAS PARA LA TRADUCCIÓN DE VOZ , 2006 .

[49]  José B. Mariño,et al.  Proyecto ALIADO: Tecnologías del habla y el lenguaje para un asistente persona , 2003, Proces. del Leng. Natural.

[50]  Antonio Bonafonte,et al.  Fusion of dictionaries in voice creation and speech synthesis task , 2007 .

[51]  Rodney W. Johnson,et al.  Letter-to-sound rules for automatic translation of english text to phonetics , 1976 .

[52]  Stanley F. Chen,et al.  Conditional and joint models for grapheme-to-phoneme conversion , 2003, INTERSPEECH.

[53]  Guy Perennou,et al.  BDLEX: a lexicon for spoken and written french , 1998, LREC.

[54]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[55]  Frédéric Bimbot,et al.  Variable-length sequence matching for phonetic transcription using joint multigrams , 1995, EUROSPEECH.

[56]  Mehmet S. Yavas,et al.  Applied English phonology , 2006 .

[57]  Asunción Moreno,et al.  Large lexica for speech-to-speech translation: from specification to creation , 2003, INTERSPEECH.

[58]  Howard C. Nusbaum,et al.  Pronounce : a program for pronunciation by analogy , 1991 .

[59]  Robert I. Damper,et al.  A multistrategy approach to improving pronunciation by analogy , 2000, CL.

[60]  A. Cruttenden Gimson's Pronunciation of English , 1994 .

[61]  François Yvon Prononcer par analogie : motivation, formalisation et evaluation , 1996 .

[62]  Luis Flórez Pronunciación del español en Bolívar (Colombia) , 1960 .

[63]  Grace Ngai,et al.  Transformation Based Learning in the Fast Lane , 2001, NAACL.

[64]  Isabel Trancoso,et al.  On deriving rules for nativised pronunciation in navigation queries , 1999, EUROSPEECH.

[65]  A. Bonafonte,et al.  FURTHER IMPROVEMENTS TO PRONUNCIATION BY ANALOGY , 2008 .

[66]  MarchandYannick,et al.  Can syllabification improve pronunciation by analogy of English , 2007 .

[67]  Walter Daelemans,et al.  Data-Oriented Methods for Grapheme-to-Phoneme Conversion , 1993, EACL.

[68]  Denis Jouvet,et al.  Grapheme-to-Phoneme Conversion Using Conditional Random Fields , 2011, INTERSPEECH.

[69]  Xavier Frías Conde INTRODUCCIÓN A LA FONÉTICA Y FONOLOGÍA DEL ESPAÑOL , 2001 .

[70]  Yong Zhao,et al.  Identifying Language Origin of Person Names With N-Grams of Different Units , 2006, 2006 IEEE International Conference on Acoustics Speech and Signal Processing Proceedings.

[71]  Robert M. Hammond,et al.  The Sounds of Spanish: Analysis and Application (with Special Reference to American English) , 2001 .

[72]  Ariadna Font Llitjós,et al.  Knowledge of language origin improves pronunciation accuracy of proper names , 2001, INTERSPEECH.

[73]  Thomas G. Dietterich,et al.  Achieving High-Accuracy Text-to-Speech with Machine Learning , 1997 .

[74]  Robert I. Damper,et al.  Comparative evaluation of letter-to-sound conversion techniques for English text-to-speech synthesis , 1998, SSW.

[75]  Laura B. Raynolds,et al.  The invented spellings of non-Spanish phonemes by Spanish–English bilingual and English monolingual kindergarteners , 2010 .

[76]  Antonio Bonafonte,et al.  New strategies for pronunciation by analogy , 2009, 2009 IEEE International Conference on Acoustics, Speech and Signal Processing.

[77]  Charles A. Perfetti,et al.  Interactive Processes in Reading , 1981 .

[78]  Jordi Adell,et al.  Database Pruning for Unsupervised Building of Text-To-Speech Voices , 2006, 2006 IEEE International Conference on Acoustics Speech and Signal Processing Proceedings.

[79]  R. Damper,et al.  Pronunciation by Analogy: Impact of Implementational Choices on Performance , 1997 .

[80]  José B. Mariño,et al.  Language modeling using x-grams , 1996, Proceeding of Fourth International Conference on Spoken Language Processing. ICSLP '96.

[81]  Eric Brill,et al.  Transformation-Based Error-Driven Learning and Natural Language Processing: A Case Study in Part-of-Speech Tagging , 1995, CL.

[82]  Mark Bedworth,et al.  NETspeak — A re-implementation of NETtalk , 1987 .

[83]  Anders Lindström,et al.  English and other foreign linguistic elements in spoken Swedish : studies of productive processes and their modelling using finite-state tools , 2004 .

[84]  Antonio Bonafonte,et al.  Learning from errors in grapheme-to-phoneme conversion , 2006, INTERSPEECH.

[85]  J E Flege,et al.  The perception of English and Spanish vowels by native English and Spanish listeners: a multidimensional scaling analysis. , 1995, The Journal of the Acoustical Society of America.

[86]  Julie Carson-Berndsen,et al.  Hidden Markov models with context-sensitive observations for grapheme-to-phoneme conversion , 2010, INTERSPEECH.

[87]  Paul Taylor,et al.  Hidden Markov models for grapheme to phoneme conversion , 2005, INTERSPEECH.

[88]  Mark Liberman,et al.  The intonational system of English , 1979 .

[89]  Shan Suthaharan,et al.  Decision Tree Learning , 2016 .

[90]  James F. Allen,et al.  Bi-directional conversion between graphemes and phonemes using a joint N-gram model , 2001, SSW.

[91]  Paul Taylor,et al.  Text-to-Speech Synthesis , 2009 .

[92]  Tony Vitale,et al.  An Algorithm for High Accuracy Name Pronunciation by Parametric Speech Synthesizer , 1991, Comput. Linguistics.

[93]  Jean-Pierre Martens,et al.  Pronunciation-based ASR for names , 2009, INTERSPEECH.

[94]  Shankar Kumar,et al.  Normalization of non-standard words , 2001, Comput. Speech Lang..

[95]  Wei Zhang,et al.  Grapheme-to-Phoneme Conversion Based on a Fast TBL Algorithm in Mandarin TTS Systems , 2005, FSKD.

[96]  Alan W. Black,et al.  Issues in building general letter to sound rules , 1998, SSW.

[97]  François Yvon Grapheme-to-Phoneme Conversion using Multiple Unbounded Overlapping Chunks , 1996, ArXiv.

[98]  Christophe d'Alessandro,et al.  Evaluating the pronunciation of proper names by four French grapheme-to-phoneme converters , 2005, INTERSPEECH.

[99]  Derek Besner,et al.  The assembly of phonology in oral reading: A new model. , 1987 .

[100]  Robert I. Damper Data-Driven Techniques in Speech Synthesis , 2001 .

[101]  Antonio Bonafonte,et al.  TC-STAR: Specifications of Language Resources and Evaluation for Speech Synthesis , 2006, LREC.

[102]  Elmar Nöth,et al.  Comparison of two tree-structured approaches for grapheme-to-phoneme conversion , 1996, Proceeding of Fourth International Conference on Spoken Language Processing. ICSLP '96.

[103]  Richard Sproat Multilingual text analysis for text-to-speech synthesis , 1996, Nat. Lang. Eng..

[104]  Geoffrey E. Hinton,et al.  Learning internal representations by error propagation , 1986 .