Similarity Based Language Model Construction for Voice Activated Open-Domain Question Answering

This paper describes a novel method of constructing a language model for speech recognition of inputs with a particular style, using a large-scale Web archive. Our target is an open domain voice-activated QA system and our speech recognition module must recognize relatively short, domain independent questions. The central issue is how to prepare a large scale training corpus with low cost, and we tackled this problem by combining an existing domain adaptation method and distributional word similarity. From 500 seed sentences and 600 million Web pages we constructed a language model covering 413,000 words. We achieved an average improvement of 3.25 points in word error rate over a baseline model constructed from randomly sampled Web sentences.

[1]  Mathias Creutz,et al.  Web Augmentation of Language Models for Continuous Speech Recognition of SMS Text Messages , 2009, EACL.

[2]  Robert Miller,et al.  Just-in-time language modelling , 1998, Proceedings of the 1998 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP '98 (Cat. No.98CH36181).

[3]  Shuntaro Isogai,et al.  Multi-Class Composite N-gram Language Model for Spoken Language Processing Using Multiple Word Clusters , 2001, ACL.

[4]  Andreas Stolcke,et al.  Getting More Mileage from Web Text Sources for Conversational Speech Language Modeling using Class-Dependent Mixtures , 2003, NAACL.

[5]  Ruhi Sarikaya,et al.  Rapid language model development using external resources for new spoken dialog domains , 2005, Proceedings. (ICASSP '05). IEEE International Conference on Acoustics, Speech, and Signal Processing, 2005..

[6]  Stanley F. Chen,et al.  Enhanced word classing for model M , 2010, INTERSPEECH.

[7]  Hagen Soltau,et al.  Decoding with shrinkage-based language models , 2010, INTERSPEECH.

[8]  Masaki Murata,et al.  Large Scale Relation Acquisition Using Class Dependent Patterns , 2009, 2009 Ninth IEEE International Conference on Data Mining.

[9]  Ronald Rosenfeld,et al.  Improving trigram language modeling with the World Wide Web , 2001, 2001 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.01CH37221).

[10]  Robert L. Mercer,et al.  Class-Based n-gram Models of Natural Language , 1992, CL.

[11]  Zellig S. Harris,et al.  Distributional Structure , 1954 .

[12]  Masaki Murata,et al.  A Bayesian Method for Robust Estimation of Distributional Similarities , 2010, ACL.

[13]  Daisuke Kawahara,et al.  TSUBAKI: An Open Search Engine Infrastructure for Developing New Information Access Methodology , 2008, IJCNLP.

[14]  Satoshi Nakamura,et al.  ATR Parallel Decoding Based Speech Recognition System Robust to Noise and Speaking Styles , 2006, IEICE Trans. Inf. Syst..

[15]  Tatsuya Kawahara,et al.  A bootstrapping approach for developing language model of new spoken dialogue systems by selecting web texts , 2006, INTERSPEECH.

[16]  Q. Mcnemar Note on the sampling error of the difference between correlated proportions or percentages , 1947, Psychometrika.