Statistical language model adaptation: review and perspectives

Speech recognition performance is severely affected when the lexical, syntactic, or semantic characteristics of the discourse in the training and recognition tasks differ. The aim of language model adaptation is to exploit specific, albeit limited, knowledge about the recognition task to compensate for this mismatch. More generally, an adaptive language model seeks to maintain an adequate representation of the current task domain under changing conditions involving potential variations in vocabulary, syntax, content, and style. This paper presents an overview of the major approaches proposed to address this issue, and offers some perspectives regarding their comparative merits and associated trade-offs.

[1]  Kuldip K. Paliwal,et al.  Automatic Speech and Speaker Recognition: Advanced Topics , 1999 .

[2]  Ronald Rosenfeld,et al.  A maximum entropy approach to adaptive statistical language modelling , 1996, Comput. Speech Lang..

[3]  Fabio Brugnara,et al.  From broadcast news to spontaneous dialogue transcription: portability issues , 2001, 2001 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.01CH37221).

[4]  J. Wolfowitz,et al.  An Introduction to the Theory of Statistics , 1951, Nature.

[5]  Jun Wu,et al.  Combining nonlocal, syntactic and n-gram dependencies in language modeling , 1999, EUROSPEECH.

[6]  Richard A. Harshman,et al.  Indexing by Latent Semantic Analysis , 1990, J. Am. Soc. Inf. Sci..

[7]  Ronald Rosenfeld,et al.  Linguistic features for whole sentence maximum entropy language models , 1999, EUROSPEECH.

[8]  Frédéric Béchet,et al.  A language model combining n-grams and stochastic finite state automata , 1999, EUROSPEECH.

[9]  Frederick Jelinek,et al.  The development of an experimental discrete dictation recognizer , 1985, Proceedings of the IEEE.

[10]  Roni Rosenfeld,et al.  A whole sentence maximum entropy language model , 1997, 1997 IEEE Workshop on Automatic Speech Recognition and Understanding Proceedings.

[11]  R. Rosenfeld,et al.  Two decades of statistical language modeling: where do we go from here? , 2000, Proceedings of the IEEE.

[12]  Thomas Hofmann,et al.  Probabilistic Latent Semantic Analysis , 1999, UAI.

[13]  Frederick Jelinek,et al.  Structured language modeling , 2000, Comput. Speech Lang..

[14]  Andreas Stolcke,et al.  Structure and performance of a dependency language model , 1997, EUROSPEECH.

[15]  Jean-Luc Gauvain,et al.  Towards task-independent speech recognition , 2001, 2001 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.01CH37221).

[16]  Lalit R. Bahl,et al.  A Maximum Likelihood Approach to Continuous Speech Recognition , 1983, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[17]  Michael Riley,et al.  Speech Recognition by Composition of Weighted Finite Automata , 1996, ArXiv.

[18]  Ronald Rosenfeld,et al.  Optimizing lexical and N-gram coverage via judicious use of linguistic data , 1995, EUROSPEECH.

[19]  Robert Miller,et al.  Just-in-time language modelling , 1998, Proceedings of the 1998 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP '98 (Cat. No.98CH36181).

[20]  Marcello Federico,et al.  Bayesian estimation methods for n-gram language model adaptation , 1996, Proceeding of Fourth International Conference on Spoken Language Processing. ICSLP '96.

[21]  Michèle Jardino Multilingual stochastic n-gram class language models , 1996, 1996 IEEE International Conference on Acoustics, Speech, and Signal Processing Conference Proceedings.

[22]  Dietrich Klakow,et al.  Language model adaptation using dynamic marginals , 1997, EUROSPEECH.

[23]  Kenneth M. Hanson,et al.  Maximum Entropy and Bayesian Methods , 1996 .

[24]  Dietrich Klakow,et al.  COMPACT MAXIMUM ENTROPY LANGUAGE MODELS , 1999 .

[25]  J. Darroch,et al.  Generalized Iterative Scaling for Log-Linear Models , 1972 .

[26]  Mari Ostendorf,et al.  Language Modeling with Sentence-Level Mixtures , 1994, HLT.

[27]  Tatsuya Kawahara,et al.  Task adaptation using MAP estimation in N-gram language modeling , 1997, 1997 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[28]  Ronald Rosenfeld,et al.  Whole-sentence exponential language models: a vehicle for linguistic-statistical integration , 2001, Comput. Speech Lang..

[29]  Frédéric Béchet,et al.  Data augmentation and language model adaptation , 2001, 2001 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.01CH37221).

[30]  Reinhard Kneser,et al.  On the dynamic adaptation of stochastic language models , 1993, 1993 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[31]  Giuseppe Riccardi,et al.  On-line learning of language models with word error probability distributions , 2001, 2001 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.01CH37221).

[32]  Frederick Jelinek,et al.  Putting language into language modeling , 1999, EUROSPEECH.

[33]  Emmanuel Roche,et al.  Finite-State Language Processing , 1997 .

[34]  Biing-Hwang Juang,et al.  An Overview of Automatic Speech Recognition , 1996 .

[35]  Jerome R. Bellegarda Large vocabulary speech recognition with multispan statistical language models , 2000, IEEE Trans. Speech Audio Process..

[36]  Daniel H. Younger,et al.  Recognition and Parsing of Context-Free Languages in Time n^3 , 1967, Inf. Control..

[37]  Renato De Mori,et al.  A Cache-Based Natural Language Model for Speech Recognition , 1990, IEEE Trans. Pattern Anal. Mach. Intell..

[38]  Ronald Rosenfeld,et al.  Topic adaptation for language modeling using unnormalized exponential models , 1998, Proceedings of the 1998 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP '98 (Cat. No.98CH36181).

[39]  Ronald Rosenfeld,et al.  Trigger-based language models: a maximum entropy approach , 1993, 1993 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[40]  Thomas Hofmann,et al.  Topic-based language models using EM , 1999, EUROSPEECH.

[41]  Francis Jack Smith,et al.  Language modelling with hierarchical domains , 1999, EUROSPEECH.

[42]  Andreas Kellner,et al.  Initial language models for spoken dialogue systems , 1998, Proceedings of the 1998 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP '98 (Cat. No.98CH36181).

[43]  Jerome R. Bellegarda,et al.  Tied mixture continuous parameter modeling for speech recognition , 1990, IEEE Trans. Acoust. Speech Signal Process..

[44]  Richard M. Schwartz,et al.  A maximum likelihood model for topic classification of broadcast news , 1997, EUROSPEECH.

[45]  Marcello Federico,et al.  Efficient language model adaptation through MDI estimation , 1999, EUROSPEECH.

[46]  Ruiqiang Zhang,et al.  Using detailed linguistic structure in language modelling , 1999, EUROSPEECH.

[47]  Jerome R. Bellegarda A New Approach to the Adaptation of Latent Semantic Information , 2001 .

[48]  Victor Zue,et al.  JUPlTER: a telephone-based conversational interface for weather information , 2000, IEEE Trans. Speech Audio Process..

[49]  Jerome R. Bellegarda,et al.  Exploiting both local and global constraints for multi-span statistical language modeling , 1998, Proceedings of the 1998 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP '98 (Cat. No.98CH36181).

[50]  Zhipeng Zhang,et al.  Recent advances in Japanese broadcast news transcription , 1999, EUROSPEECH.

[51]  John D. Lafferty,et al.  Cluster Expansions and Iterative Scaling for Maximum Entropy Language Models , 1995, ArXiv.

[52]  Mehryar Mohri,et al.  Minimization algorithms for sequential transducers , 2000, Theor. Comput. Sci..

[53]  Ronald Rosenfeld,et al.  Using story topics for language model adaptation , 1997, EUROSPEECH.

[54]  Roger K. Moore Computer Speech and Language , 1986 .

[55]  Kenneth Ward Church Phonological parsing in speech recognition , 1987 .

[56]  Salim Roukos,et al.  MDI adaptation of language models across corpora , 1997, EUROSPEECH.

[57]  Thomas Niesler,et al.  A variable-length category-based n-gram language model , 1996, 1996 IEEE International Conference on Acoustics, Speech, and Signal Processing Conference Proceedings.

[58]  Thomas Hofmann,et al.  Probabilistic Topic Maps: Navigating through Large Text Collections , 1999, IDA.

[59]  John D. Lafferty,et al.  Inducing Features of Random Fields , 1995, IEEE Trans. Pattern Anal. Mach. Intell..

[60]  Yves Schabes,et al.  Speech Recognition by Composition of Weighted Finite Automata , 1997 .

[61]  Jerome R. Bellegarda,et al.  A multispan language modeling framework for large vocabulary speech recognition , 1998, IEEE Trans. Speech Audio Process..

[62]  Stefan Besling,et al.  Language model speaker adaptation , 1995, EUROSPEECH.

[63]  Daniel Jurafsky,et al.  Towards better integration of semantic predictors in statistical language modeling , 1998, ICSLP.

[64]  Marcello Federico,et al.  Language Model Adaptation , 1999 .

[65]  Ronald Rosenfeld,et al.  A survey of smoothing techniques for ME models , 2000, IEEE Trans. Speech Audio Process..

[66]  A. Gorin On automated language acquisition , 1989 .

[67]  Wolfgang Reichl Language model adaptation using minimum discrimination information , 1999, EUROSPEECH.

[68]  Giuseppe Riccardi,et al.  Stochastic language adaptation over time and state in natural spoken dialog systems , 2000, IEEE Trans. Speech Audio Process..

[69]  James F. Allen,et al.  Hierarchical statistical language models: experiments on in-domain adaptation , 2000, INTERSPEECH.

[70]  Ronald Rosenfeld,et al.  Improving trigram language modeling with the World Wide Web , 2001, 2001 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.01CH37221).

[71]  Hauke Schramm,et al.  The thoughtful elephant: strategies for spoken dialog systems , 2000, IEEE Trans. Speech Audio Process..

[72]  Jean-Luc Gauvain,et al.  Language modeling for broadcast news transcription , 1999, EUROSPEECH.

[73]  Mari Ostendorf,et al.  Modeling long distance dependence in language: topic mixtures versus dynamic cache models , 1996, IEEE Trans. Speech Audio Process..

[74]  Hermann Ney,et al.  Adaptive topic - dependent language modelling using word - based varigrams , 1997, EUROSPEECH.

[75]  Anthony J. Robinson,et al.  Language model adaptation using mixtures and an exponentially decaying cache , 1997, 1997 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[76]  J.R. Bellegarda,et al.  Exploiting latent semantic information in statistical language modeling , 2000, Proceedings of the IEEE.

[77]  Ciprian Chelba Portability of syntactic structure for language modeling , 2001, 2001 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.01CH37221).

[78]  Taiyi Huang,et al.  An improved MAP method for language model adaptation , 1999, EUROSPEECH.