Pronunciation adaptation at the lexical level

There are various kinds of adaptation which can be used to enhance the performance of automatic speech recognizers. This paper is about pronunciation adaptation at the lexical level, i.e. about modeling pronunciation variation at the lexical level. In the early years of automatic speech recognition (ASR) research, the amount of pronunciation variation was limited by using isolated words. Since the focus gradually shifted from isolated words to conversational speech, the amount of pronunciation variation present in the speech signals has increased, as has the need to model it. This is reflected by the growing attention for this topic. In this paper, an overview of the studies on lexicon adaptation is presented. Furthermore, many examples are mentioned of situations in which lexicon adaptation is likely to improve the performance of speech recognizers. Finally, it is argued that some assumptions made in current standard ASR systems are not in line with the properties of the speech signals. Consequently, the problem of pronunciation variation at the lexical level probably cannot be solved by simply adding new transcriptions to the lexicon, as it is generally done at the moment.

[1]  G. E. Peterson,et al.  Control Methods Used in a Study of the Vowels , 1951 .

[2]  A. Gray,et al.  I. THE ORIGIN OF SPECIES BY MEANS OF NATURAL SELECTION , 1963 .

[3]  A. Barnett A PHONOLOGICAL RULE COMPILER , 1974 .

[4]  Lalit R. Bahl,et al.  Design of a linguistic statistical decoder for the recognition of continuous speech , 1975, IEEE Trans. Inf. Theory.

[5]  J. Friedman,et al.  Computer exploration of fast-speech rules , 1975 .

[6]  V. Zue,et al.  The role of phonological rules in speech understanding research , 1975 .

[7]  Charles C. Tappert,et al.  Experiments with a tree-search method for converting noisy phonetic representation into standard orthography , 1975 .

[8]  H. Giles,et al.  Speech style and social evaluation , 1975 .

[9]  Victor Lesser,et al.  IN THE HEARSAY-II SPEECH UNDERSTANDING SYSTEM , 1976 .

[10]  N. Coupland Accommodation at work: some phonological data and their implications , 1984 .

[11]  Lotfi A. Zadeh,et al.  Phonological structures for speech recognition , 1989 .

[12]  J. Goldsmith Autosegmental and Metrical Phonology , 1990 .

[13]  H Hermansky,et al.  Perceptual linear predictive (PLP) analysis of speech. , 1990, The Journal of the Acoustical Society of America.

[14]  Maxine Eskénazi,et al.  Trends in speaking styles research , 1993, EUROSPEECH.

[15]  C. Cucchiarini,et al.  Phonetic transcription: a methodological and empirical study , 1993 .

[16]  J C Junqua,et al.  The Lombard reflex and its role on human listeners and automatic speech recognizers. , 1993, The Journal of the Acoustical Society of America.

[17]  Iain R. Murray,et al.  Toward the simulation of emotion in synthetic speech: a review of the literature on human vocal emotion. , 1993, The Journal of the Acoustical Society of America.

[18]  John Laver,et al.  Principles of Phonetics: Principles of transcription , 1994 .

[19]  Michael Riley,et al.  Prediction of word confusabilities for speech recognition , 1994, ICSLP.

[20]  Akio Ando,et al.  A new method for automatic generation of speaker-dependent phonological rules , 1995, 1995 International Conference on Acoustics, Speech, and Signal Processing.

[21]  Gudrun Flach Modelling pronunciation variability for special domains , 1995, EUROSPEECH.

[22]  Steve J. Young,et al.  Towards improved speech recognition using a speech production model , 1995, EUROSPEECH.

[23]  Xavier L. Aubert,et al.  Improved acoustic-phonetic modeling in philips' dictation system by handling liaisons and multiple pronunciations , 1995, EUROSPEECH.

[24]  Eric Fosler-Lussier,et al.  Fast speakers in large vocabulary continuous speech recognition: analysis & antidotes , 1995, EUROSPEECH.

[25]  Alexander H. Waibel,et al.  Dictionary learning for spontaneous speech recognition , 1996, Proceeding of Fourth International Conference on Spoken Language Processing. ICSLP '96.

[26]  Maria-Barbara Wesenick Automatic generation of German pronunciation variants , 1996, Proceeding of Fourth International Conference on Spoken Language Processing. ICSLP '96.

[27]  Steve J. Young,et al.  Pseudo-articulatory speech synthesis for recognition using automatic feature extraction from X-ray data , 1996, Proceeding of Fourth International Conference on Spoken Language Processing. ICSLP '96.

[28]  Martin J. Russell,et al.  Modeling speech variability with segmental HMMs , 1996, 1996 IEEE International Conference on Acoustics, Speech, and Signal Processing Conference Proceedings.

[29]  Joseph Picone,et al.  Automated generation of N-best pronunciations of proper nouns , 1996, 1996 IEEE International Conference on Acoustics, Speech, and Signal Processing Conference Proceedings.

[30]  Lori Lamel,et al.  On designing pronunciation lexicons for large vocabulary continuous speech recognition , 1996, Proceeding of Fourth International Conference on Spoken Language Processing. ICSLP '96.

[31]  Florian Schiel,et al.  Automatic detection and segmentation of pronunciation variants in German speech corpora , 1996, Proceeding of Fourth International Conference on Spoken Language Processing. ICSLP '96.

[32]  Alexander H. Waibel,et al.  Speaking mode dependent pronunciation modeling in large vocabulary conversational speech recognition , 1997, EUROSPEECH.

[33]  Florian Schiel,et al.  Pronuncation modeling applied to automatic segmentation of spontaneous speech , 1997, EUROSPEECH.

[34]  Joseph Picone,et al.  Microsegment-based connected digit recognition , 1997, 1997 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[35]  Maxine Eskénazi,et al.  Automatic generation of context-dependent pronunciations , 1997, EUROSPEECH.

[36]  Nancy Thomas-Stonell,et al.  Effects of speech training on the accuracy of speech recognition for an individual with a speech impairment , 1997 .

[37]  Luis A. Hernández Gómez,et al.  Automatic alternative transcription generation and vocabulary selection for flexible word recognizers , 1997, 1997 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[38]  H. A. Leeper,et al.  Dysarthric speech: a comparison of computerized speech recognition and listener intelligibility. , 1997, Journal of rehabilitation research and development.

[39]  Richard Wiseman,et al.  Dynamic and static improvements to lexical baseforms , 1997, EUROSPEECH.

[40]  Javier Ferreiros,et al.  INTRODUCING MULTIPLE PRONUNCIATIONS IN SPANISH SPEECH RECOGNITION SYSTEMS , 1998 .

[41]  Steve Renals,et al.  Confidence Measures for Evaluating Pronunciation Models , 1998 .

[42]  Florian Schiel,et al.  Statistical Modelling Of Pronunciation: It's Not The Model, It's The Data , 1998 .

[43]  Harriet J. Nock,et al.  Detecting and correcting poor pronunciations for multiword units , 1998 .

[44]  Thomas Polzin,et al.  Pronunciation Variations In Emotional Speech , 1998 .

[45]  Helmer Strik,et al.  Proceedings of the ESCA Workshop 'Modeling Pronunciation Variation for Automatic Speech Recognition' , 1998 .

[46]  Gunnar Lehtinen,et al.  Modeling Pronunciation Variations and Coarticulation with Finite-state Transducers in Csr , 1998 .

[47]  河合 剛 STiLL(Speech Technology in Language Learning)参加報告 , 1998 .

[48]  Gunnar Lehtinen,et al.  GENERATION AND SELECTION OF PRONUNCIATION VARIANTS FOR A FLEXIBLE WORD RECOGNIZER , 1998 .

[49]  Torbjørn Svendsen,et al.  Maximum likelihood modelling of pronunciation variation , 1999, Speech Commun..

[50]  Lori Lamel,et al.  Pronunciation variants across system configuration, language and speaking style , 1999, Speech Commun..

[51]  Mari Ostendorf,et al.  Joint lexicon, acoustic unit inventory and model design , 1999, Speech Commun..

[52]  Judith M. Kessens,et al.  Comparison between expert listeners and continuous speech recognizers in selecting pronunciation variants , 1999 .

[53]  Katarina Bartkova,et al.  Language based phone model combination for ASR adaptation to foreign accent , 1999 .

[54]  Helmer Strik,et al.  Improving the performance of a Dutch CSR by modeling within-word and cross-word pronunciation variation , 1999, Speech Commun..

[55]  Yoshinori Sagisaka,et al.  Automatic generation of multiple pronunciations based on neural networks , 1999, Speech Commun..

[56]  Fosler-Lussier,et al.  EFFECTS OF SPEAKING RATE AND WORD FREQUENCY ONCONVERSATIONAL PRONUNCIATIONSEric , 1999 .

[57]  Jean-Pierre Martens,et al.  In search of better pronunciation models for speech recognition , 1999, Speech Commun..

[58]  William J. Byrne,et al.  Stochastic pronunciation modelling from hand-labelled phonetic corpora , 1999, Speech Commun..

[59]  Steven Greenberg,et al.  Speaking in shorthand - A syllable-centric perspective for understanding pronunciation variation , 1999, Speech Commun..

[60]  Jean-Claude Junqua,et al.  The Lombard effect: a reflex to better communicate with others in noise , 1999, 1999 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings. ICASSP99 (Cat. No.99CH36258).

[61]  Maxine Eskenazi,et al.  USING AUTOMATIC SPEECH PROCESSING FOR FOREIGN LANGUAGE PRONUNCIATION TUTORING: SOME ISSUES AND A PROTOTYPE , 1999 .

[62]  Speech recognition training for enhancing written language generation by a traumatic brain injury survivor , 2000 .

[63]  K. Hux,et al.  Speech recognition training for enhancing written language generation by a traumatic brain injury survivor. , 2000, Brain injury.

[64]  Helmer Strik,et al.  A bottom-up method for obtaining information about pronunciation variation , 2000, INTERSPEECH.

[65]  Anders Lindström,et al.  A two-level approach to the handling of foreign items in Swedish speech technology applications , 2000, INTERSPEECH.

[66]  Filipp Korkmazskiy,et al.  Joint pronunciation modelling of non-native speakers using data-driven methods , 2000, INTERSPEECH.

[67]  Richard M. Stern,et al.  Structured redefinition of sound units by merging and splitting for improved speech recognition , 2000, INTERSPEECH.

[68]  Sanjeev Khudanpur,et al.  Pronunciation ambiguity vs. pronunciation variability in speech recognition , 2000, 2000 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.00CH37100).

[69]  Qian Yang,et al.  Data-driven lexical modeling of pronunciation variations for ASR , 2000, INTERSPEECH.

[70]  Kåre Jean Jensen,et al.  Multilingual text-to-phoneme mapping , 2001, INTERSPEECH.

[71]  M Wester,et al.  Obtaining Phonetic Transcriptions: A Comparison between Expert Listeners and a Continuous Speech Recognizer , 2001, Language and speech.

[72]  Sanjeev Khudanpur,et al.  Pronunciation modeling for conversational speech recognition , 2001 .