Hierarchical statistical language models: experiments on in-domain adaptation

We introduce a hierarchical statistical language model, represented as a collection of local models plus a general sentence model. We provide an example that mixes a trigram general model and a PFSA local model for the class of decimal numbers, described in terms of sub-word units (graphemes). This model practically extends the vocabulary of the overall model to an infinite size, but still has better performance compared to a word-based model. Using in-domain language model adaptation experiments, we show that local models can encode enough linguistic information, if well trained, that they may be ported to new language models without re-estimation.

[1]  Marcello Federico,et al.  Language Model Adaptation , 1999 .

[2]  Joerg P. Ueberla,et al.  Analyzing and Improving Statistical Language Models for Speech Recognition , 1994, ArXiv.

[3]  Isabelle Guyon,et al.  Design of a linguistic postprocessor using variable memory length Markov models , 1995, Proceedings of 3rd International Conference on Document Analysis and Recognition.

[4]  James F. Allen,et al.  Evaluating hierarchical hybrid statistical language models , 2000, INTERSPEECH.

[5]  Ronald Rosenfeld,et al.  Statistical language modeling using the CMU-cambridge toolkit , 1997, EUROSPEECH.

[6]  Frédéric Béchet,et al.  A language model combining n-grams and stochastic finite state automata , 1999, EUROSPEECH.

[7]  Xuedong Huang,et al.  A unified context-free grammar and n-gram model for spoken language processing , 2000, 2000 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.00CH37100).

[8]  E. P. Giachin Automatic training of stochastic finite-state language models for speech understanding , 1992, [Proceedings] ICASSP-92: 1992 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[9]  Stephanie Seneff,et al.  The use of linguistic hierarchies in speech understanding , 1998, ICSLP.

[10]  Pasi Tapanainen,et al.  What is a word, What is a sentence? Problems of Tokenization , 1994 .

[11]  Wayne H. Ward,et al.  A language model combining trigrams and stochastic context-free grammars , 1998, ICSLP.

[12]  Eric K. Ringger,et al.  Rapid language model development for new task domains , 1998 .

[13]  Jan Robin Rohlicek,et al.  Statistical language modeling combining N-gram and context-free grammars , 1993, 1993 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[14]  Roberto Pieraccini,et al.  Stochastic automata for language modeling , 1996, Comput. Speech Lang..

[15]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[16]  Alexander H. Waibel,et al.  Class phrase models for language modeling , 1996, Proceeding of Fourth International Conference on Spoken Language Processing. ICSLP '96.

[17]  Wayne H. Ward,et al.  A class based language model for speech recognition , 1996, 1996 IEEE International Conference on Acoustics, Speech, and Signal Processing Conference Proceedings.

[18]  Fabio Brugnara,et al.  Dynamic language models for interactive speech applications , 1997, EUROSPEECH.

[19]  Rebecca N. Wright,et al.  Finite-State Approximation of Phrase Structure Grammars , 1991, ACL.

[20]  Douglas E. Appelt,et al.  Combining Linguistic and Statistical Knowledge Sources in Natural-Language Processing for ATIS , 1995 .