Pronunciation Adaption at the Lexical Level

There are various kinds o f adaptation which can be used to enhance the perform ance o f automatic speech recognizers. This paper is about pronunciation adaptation at the lexical level, i.e. about m odeling pronunciation variation at the lexical level. In the early years o f automatic speech recognition (ASR) research, the amount o f pronunciation variation was lim ited by using isolated w ords. Since the focus gradually shifted from isolated w ords to conversational speech, the amount o f pronunciation variation present in the speech signals has increased, as has the need to m odel it. This is reflected by the growing attention for this topic. In this paper, an overview o f the studies on lexicon adaptation is presented. Furtherm ore, m any examples are m entioned o f situations in which lexicon adaptation is likely to improve the perform ance o f speech recognizers. Finally, it is argued that some assumptions made in current standard A SR systems are not in line with the properties o f the speech signals. Consequently, the problem o f pronunciation variation at the lexical level probably cannot be solved by simply adding new transcriptions to the lexicon, as it is generally done at the moment.

[1]  Lori Lamel,et al.  Pronunciation variants across system configuration, language and speaking style , 1999, Speech Commun..

[2]  K. Hux,et al.  Speech recognition training for enhancing written language generation by a traumatic brain injury survivor. , 2000, Brain injury.

[3]  Florian Schiel,et al.  Pronuncation modeling applied to automatic segmentation of spontaneous speech , 1997, EUROSPEECH.

[4]  Judith M. Kessens,et al.  Comparison between expert listeners and continuous speech recognizers in selecting pronunciation variants , 1999 .

[5]  Akio Ando,et al.  A new method for automatic generation of speaker-dependent phonological rules , 1995, 1995 International Conference on Acoustics, Speech, and Signal Processing.

[6]  Maxine Eskénazi,et al.  Trends in speaking styles research , 1993, EUROSPEECH.

[7]  Helmer Strik,et al.  A bottom-up method for obtaining information about pronunciation variation , 2000, INTERSPEECH.

[8]  Qian Yang,et al.  Data-driven lexical modeling of pronunciation variations for ASR , 2000, INTERSPEECH.

[9]  Michael Riley,et al.  Prediction of word confusabilities for speech recognition , 1994, ICSLP.

[10]  Eric Fosler-Lussier,et al.  Effects of speaking rate and word frequency on pronunciations in convertional speech , 1999, Speech Commun..

[11]  Lotfi A. Zadeh,et al.  Phonological structures for speech recognition , 1989 .

[12]  Alexander H. Waibel,et al.  Recognition of conversational telephone speech using the JANUS speech engine , 1997, 1997 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[13]  Yoshinori Sagisaka,et al.  Automatic generation of multiple pronunciations based on neural networks , 1999, Speech Commun..

[14]  Alexander H. Waibel,et al.  Speaking mode dependent pronunciation modeling in large vocabulary conversational speech recognition , 1997, EUROSPEECH.

[15]  Lalit R. Bahl,et al.  Design of a linguistic statistical decoder for the recognition of continuous speech , 1975, IEEE Trans. Inf. Theory.

[16]  Jean-Claude Junqua,et al.  The Lombard effect: a reflex to better communicate with others in noise , 1999, 1999 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings. ICASSP99 (Cat. No.99CH36258).

[17]  Iain R. Murray,et al.  Toward the simulation of emotion in synthetic speech: a review of the literature on human vocal emotion. , 1993, The Journal of the Acoustical Society of America.

[18]  Joseph Picone,et al.  Automated generation of N-best pronunciations of proper nouns , 1996, 1996 IEEE International Conference on Acoustics, Speech, and Signal Processing Conference Proceedings.

[19]  Filipp Korkmazskiy,et al.  Joint pronunciation modelling of non-native speakers using data-driven methods , 2000, INTERSPEECH.

[20]  Steve J. Young,et al.  Pseudo-articulatory speech synthesis for recognition using automatic feature extraction from X-ray data , 1996, Proceeding of Fourth International Conference on Spoken Language Processing. ICSLP '96.

[21]  K. Scherer,et al.  Social Markers in Speech , 1980 .

[22]  H. A. Leeper,et al.  Dysarthric speech: a comparison of computerized speech recognition and listener intelligibility. , 1997, Journal of rehabilitation research and development.

[23]  Nancy Thomas-Stonell,et al.  Effects of speech training on the accuracy of speech recognition for an individual with a speech impairment , 1997 .

[24]  Helmer Strik,et al.  Proceedings of the ESCA Workshop 'Modeling Pronunciation Variation for Automatic Speech Recognition' , 1998 .

[25]  C. Darwin The Origin of Species by Means of Natural Selection, Or, The Preservation of Favoured Races in the Struggle for Life , 1859 .

[26]  J. Friedman,et al.  Computer exploration of fast-speech rules , 1975 .

[27]  Torbjørn Svendsen,et al.  Maximum likelihood modelling of pronunciation variation , 1999, Speech Commun..

[28]  J. Goldsmith Autosegmental and Metrical Phonology , 1990 .

[29]  M Wester,et al.  Obtaining Phonetic Transcriptions: A Comparison between Expert Listeners and a Continuous Speech Recognizer , 2001, Language and speech.

[30]  Sanjeev Khudanpur,et al.  Pronunciation modeling for conversational speech recognition , 2001 .

[31]  V. Zue,et al.  The role of phonological rules in speech understanding research , 1975 .

[32]  G. E. Peterson,et al.  Control Methods Used in a Study of the Vowels , 1951 .

[33]  Lori Lamel,et al.  On designing pronunciation lexicons for large vocabulary continuous speech recognition , 1996, Proceeding of Fourth International Conference on Spoken Language Processing. ICSLP '96.

[34]  Joseph Picone,et al.  Microsegment-based connected digit recognition , 1997, 1997 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[35]  Steve J. Young,et al.  Towards improved speech recognition using a speech production model , 1995, EUROSPEECH.

[36]  Anders Lindström,et al.  A two-level approach to the handling of foreign items in Swedish speech technology applications , 2000, INTERSPEECH.

[37]  Li Deng,et al.  Speech recognition using the atomic speech units constructed from overlapping articulatory features , 1994, EUROSPEECH.

[38]  Maria-Barbara Wesenick Automatic generation of German pronunciation variants , 1996, Proceeding of Fourth International Conference on Spoken Language Processing. ICSLP '96.

[39]  Maxine Eskenazi,et al.  USING AUTOMATIC SPEECH PROCESSING FOR FOREIGN LANGUAGE PRONUNCIATION TUTORING: SOME ISSUES AND A PROTOTYPE , 1999 .

[40]  Katarina Bartkova,et al.  Language based phone model combination for ASR adaptation to foreign accent , 1999 .

[41]  Eric Fosler-Lussier,et al.  Fast speakers in large vocabulary continuous speech recognition: analysis & antidotes , 1995, EUROSPEECH.

[42]  Richard M. Stern,et al.  Structured redefinition of sound units by merging and splitting for improved speech recognition , 2000, INTERSPEECH.

[43]  A. Barnett A PHONOLOGICAL RULE COMPILER , 1974 .

[44]  William J. Byrne,et al.  Stochastic pronunciation modelling from hand-labelled phonetic corpora , 1999, Speech Commun..

[45]  Luis A. Hernández Gómez,et al.  Automatic alternative transcription generation and vocabulary selection for flexible word recognizers , 1997, 1997 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[46]  Jean-Pierre Martens,et al.  In search of better pronunciation models for speech recognition , 1999, Speech Commun..

[47]  N. Coupland Accommodation at work: some phonological data and their implications , 1984 .

[48]  H Hermansky,et al.  Perceptual linear predictive (PLP) analysis of speech. , 1990, The Journal of the Acoustical Society of America.

[49]  John Laver,et al.  Principles of Phonetics: Principles of transcription , 1994 .

[50]  Steven Greenberg,et al.  Speaking in shorthand - A syllable-centric perspective for understanding pronunciation variation , 1999, Speech Commun..

[51]  Alexander H. Waibel,et al.  Dictionary learning for spontaneous speech recognition , 1996, Proceeding of Fourth International Conference on Spoken Language Processing. ICSLP '96.

[52]  J C Junqua,et al.  The Lombard reflex and its role on human listeners and automatic speech recognizers. , 1993, The Journal of the Acoustical Society of America.

[53]  Richard Wiseman,et al.  Dynamic and static improvements to lexical baseforms , 1997, EUROSPEECH.

[54]  Florian Schiel,et al.  Automatic detection and segmentation of pronunciation variants in German speech corpora , 1996, Proceeding of Fourth International Conference on Spoken Language Processing. ICSLP '96.

[55]  Charles C. Tappert,et al.  Experiments with a tree-search method for converting noisy phonetic representation into standard orthography , 1975 .

[56]  Martin J. Russell,et al.  Modeling speech variability with segmental HMMs , 1996, 1996 IEEE International Conference on Acoustics, Speech, and Signal Processing Conference Proceedings.

[57]  Xavier L. Aubert,et al.  Improved acoustic-phonetic modeling in philips' dictation system by handling liaisons and multiple pronunciations , 1995, EUROSPEECH.

[58]  Harriet J. Nock,et al.  Detecting and correcting poor pronunciations for multiword units , 1998 .

[59]  Kåre Jean Jensen,et al.  Multilingual text-to-phoneme mapping , 2001, INTERSPEECH.

[60]  Steve Renals,et al.  Confidence Measures for Evaluating Pronunciation Models , 1998 .

[61]  C. Cucchiarini,et al.  Phonetic transcription: a methodological and empirical study , 1993 .

[62]  Helmer Strik,et al.  Improving the performance of a Dutch CSR by modeling within-word and cross-word pronunciation variation , 1999, Speech Commun..

[63]  Maxine Eskénazi,et al.  Automatic generation of context-dependent pronunciations , 1997, EUROSPEECH.

[64]  Mari Ostendorf,et al.  Joint lexicon, acoustic unit inventory and model design , 1999, Speech Commun..

[65]  Sanjeev Khudanpur,et al.  Pronunciation ambiguity vs. pronunciation variability in speech recognition , 2000, 2000 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.00CH37100).

[66]  H. Giles,et al.  Speech style and social evaluation , 1975 .