A New Bigram-PLSA Language Model for Speech Recognition

A novel method for combining bigram model and Probabilistic Latent Semantic Analysis (PLSA) is introduced for language modeling. The motivation behind this idea is the relaxation of the "bag of words" assumption fundamentally present in latent topic models including the PLSA model. An EM-based parameter estimation technique for the proposed model is presented in this paper. Previous attempts to incorporate word order in the PLSA model are surveyed and compared with our new proposed model both in theory and by experimental evaluation. Perplexity measure is employed to compare the effectiveness of recently introduced models with the new proposed model. Furthermore, experiments are designed and carried out on continuous speech recognition (CSR) tasks using word error rate (WER) as the evaluation criterion. The superiority of the new bigram-PLSA model over Nie et al.'s bigram-PLSA and simple PLSA models is demonstrated in the results of our experiments. Experiments on BLLIP WSJ corpus show about 12% reduction in perplexity and 2.8% WER improvement compared to Nie et al.'s bigram-PLSA model.

[1]  Paul Lamere,et al.  Sphinx-4: a flexible open source framework for speech recognition , 2004 .

[2]  Thomas L. Griffiths,et al.  Integrating Topics and Syntax , 2004, NIPS.

[3]  Richard A. Harshman,et al.  Indexing by Latent Semantic Analysis , 1990, J. Am. Soc. Inf. Sci..

[4]  Marcello Federico,et al.  Language model adaptation through topic decomposition and MDI estimation , 2002, 2002 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[5]  Thomas Hofmann,et al.  Unsupervised Learning by Probabilistic Latent Semantic Analysis , 2004, Machine Learning.

[6]  Xihong Wu,et al.  Refine bigram PLSA model by assigning latent topics unevenly , 2007, 2007 IEEE Workshop on Automatic Speech Recognition & Understanding (ASRU).

[7]  Thomas Hofmann,et al.  Learning from Dyadic Data , 1998, NIPS.

[8]  Ian H. Witten,et al.  The zero-frequency problem: Estimating the probabilities of novel events in adaptive text compression , 1991, IEEE Trans. Inf. Theory.

[9]  Ata Kabán,et al.  Simplicial Mixtures of Markov Chains: Distributed Modelling of Dynamic User Profiles , 2003, NIPS.

[10]  Philip C. Woodland,et al.  Unsupervised language model adaptation for Mandarin broadcast conversation transcription , 2006, INTERSPEECH.

[11]  Hanna M. Wallach,et al.  Topic modeling: beyond bag-of-words , 2006, ICML.

[12]  Andrew McCallum,et al.  A Note on Topical N-grams , 2005 .

[13]  J.R. Bellegarda,et al.  Exploiting latent semantic information in statistical language modeling , 2000, Proceedings of the IEEE.

[14]  Slava M. Katz,et al.  Estimation of probabilities from sparse data for the language model component of a speech recognizer , 1987, IEEE Trans. Acoust. Speech Signal Process..

[15]  Thomas Hofmann,et al.  Topic-based language models using EM , 1999, EUROSPEECH.

[16]  Mark Steyvers,et al.  Topics in semantic representation. , 2007, Psychological review.

[17]  Michael I. Jordan,et al.  Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..