Web-based possibilistic language models for automatic speech recognition

Abstract This paper describes a new kind of language models based on the possibility theory. The purpose of these new models is to better use the data available on the Web for language modeling. These models aim to integrate information relative to impossible word sequences. We address the two main problems of using this kind of model: how to estimate the measures for word sequences and how to integrate this kind of model into the ASR system. We propose a word-sequence possibilistic measure and a practical estimation method based on word-sequence statistics, which is particularly suited for estimating from Web data. We develop several strategies and formulations for using these models in a classical automatic speech recognition engine, which relies on a probabilistic modeling of the speech recognition process. This work is evaluated on two typical usage scenarios: broadcast news transcription with very large training sets and transcription of medical videos, in a specialized domain, with only very limited training data. The results show that the possibilistic models provide significantly lower word error rate on the specialized domain task, where classical n -gram models fail due to the lack of training materials. For the broadcast news, the probabilistic models remain better than the possibilistic ones. However, a log-linear combination of the two kinds of models outperforms all the models used individually, which indicates that possibilistic models bring information that is not modeled by probabilistic ones.

[1]  Thomas Hain,et al.  Strategies for Language Model Web-Data Collection , 2006, 2006 IEEE International Conference on Acoustics Speech and Signal Processing Proceedings.

[2]  David Guthrie,et al.  Storing the Web in Memory: Space Efficient Language Models with Constant Time Retrieval , 2010, EMNLP.

[3]  L. Zadeh Fuzzy sets as a basis for a theory of possibility , 1999 .

[4]  Richard M. Schwartz,et al.  Automatic Detection Of New Words In A Large Vocabulary Continuous Speech Recognition System , 1989, HLT.

[5]  Alexandre Allauzen,et al.  Diachronic vocabulary adaptation for broadcast news transcription , 2005, INTERSPEECH.

[6]  Frank Keller,et al.  Using the Web to Obtain Frequencies for Unseen Bigrams , 2003, CL.

[7]  James C. French,et al.  Obtaining language models of web collections using query-based sampling techniques , 2002, Proceedings of the 35th Annual Hawaii International Conference on System Sciences.

[8]  Panayiotis G. Georgiou,et al.  Building topic specific language models from webdata using competitive models , 2005, INTERSPEECH.

[9]  G. Cooman POSSIBILITY THEORY I: THE MEASURE- AND INTEGRAL-THEORETIC GROUNDWORK , 1997 .

[10]  Marcello Federico,et al.  Lexicon adaptation for broadcast news transcription , 2001 .

[11]  Thorsten Brants,et al.  Large Language Models in Machine Translation , 2007, EMNLP.

[12]  Didier Dubois,et al.  Possibility theory and statistical reasoning , 2006, Comput. Stat. Data Anal..

[13]  Joshua Goodman,et al.  A bit of progress in language modeling , 2001, Comput. Speech Lang..

[14]  Didier Dubois,et al.  Possibility Theory - An Approach to Computerized Processing of Uncertainty , 1988 .

[15]  Pascale Sébillot,et al.  An unsupervised web-based topic language model adaptation method , 2008, 2008 IEEE International Conference on Acoustics, Speech and Signal Processing.

[16]  Gérard Chollet Evaluation of ASR Systems, Algorithms and Databases , 1995 .

[17]  Vakgroep Elektrische EnergietechniekTechnologiepark Possibility Theory I the Measure-and Integral-theoretic Groundwork , 1997 .

[18]  Georges Linarès,et al.  Probabilistic and possibilistic language models based on the world wide web , 2009, INTERSPEECH.

[19]  Georges Linarès,et al.  Using the World Wide Web for Learning New Words in Continuous Speech Recognition Tasks: Two Case Studies , 2009 .

[20]  Robert Miller,et al.  Just-in-time language modelling , 1998, Proceedings of the 1998 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP '98 (Cat. No.98CH36181).

[21]  Wilson C. Hsieh,et al.  Bigtable: A Distributed Storage System for Structured Data , 2006, TOCS.

[22]  GhemawatSanjay,et al.  The Google file system , 2003 .

[23]  Ronald Rosenfeld,et al.  Improving trigram language modeling with the World Wide Web , 2001, 2001 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.01CH37221).

[24]  Geoffrey E. Hinton,et al.  Three new graphical models for statistical language modelling , 2007, ICML '07.

[25]  Yoshua Bengio,et al.  Neural Probabilistic Language Models , 2006 .

[26]  Andreas Stolcke,et al.  Getting More Mileage from Web Text Sources for Conversational Speech Language Modeling using Class-Dependent Mixtures , 2003, NAACL.

[27]  F. Jelinek,et al.  Continuous speech recognition by statistical methods , 1976, Proceedings of the IEEE.

[28]  Alexander H. Waibel,et al.  Reducing the OOV rate in broadcast news speech recognition , 1998, ICSLP.