Topic-Dependent Language Model with Voting on Noun History

Language models (LMs) are an important field of study in automatic speech recognition (ASR) systems. LM helps acoustic models find the corresponding word sequence of a given speech signal. Without it, ASR systems would not understand the language and it would be hard to find the correct word sequence. During the past few years, researchers have tried to incorporate long-range dependencies into statistical word-based n-gram LMs. One of these long-range dependencies is topic. Unlike words, topic is unobservable. Thus, it is required to find the meanings behind the words to get into the topic. This research is based on the belief that nouns contain topic information. We propose a new approach for a topic-dependent LM, where the topic is decided in an unsupervised manner. Latent Semantic Analysis (LSA) is employed to reveal hidden (latent) relations among nouns in the context words. To decide the topic of an event, a fixed size word history sequence (window) is observed, and voting is then carried out based on noun class occurrences weighted by a confidence measure. Experiments were conducted on an English corpus and a Japanese corpus: The Wall Street Journal corpus and Mainichi Shimbun (Japanese newspaper) corpus. The results show that our proposed method gives better perplexity than the comparative baselines, including a word-based/class-based n-gram LM, their interpolated LM, a cache-based LM, a topic-dependent LM based on n-gram, and a topic-dependent LM based on Latent Dirichlet Allocation (LDA). The n-best list rescoring was conducted to validate its application in ASR systems.

[1]  Jerome R. Bellegarda,et al.  A multispan language modeling framework for large vocabulary speech recognition , 1998, IEEE Trans. Speech Audio Process..

[2]  Hermann Ney,et al.  Improved clustering techniques for class-based statistical language modelling , 1993, EUROSPEECH.

[3]  Reinhard Kneser,et al.  On the dynamic adaptation of stochastic language models , 1993, 1993 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[4]  Jerome R. Bellegarda,et al.  A novel word clustering algorithm based on latent semantic analysis , 1996, 1996 IEEE International Conference on Acoustics, Speech, and Signal Processing Conference Proceedings.

[5]  Helmer Strik,et al.  Comparing the recognition performance of CSRs: in search of an adequate metric and statistical significance test , 2000, INTERSPEECH.

[6]  Kuang-hua Chen,et al.  Topic Identification in Discourse , 1995, EACL.

[7]  S. Nakagawa,et al.  Word Co-occurrence Matrix and Context Dependent Class in LSA based Language Model for Speech Recognition , 2009 .

[8]  NaptaliWelly,et al.  Topic-Dependent Language Model with Voting on Noun History , 2010 .

[9]  Berlin Chen,et al.  Word Topic Models for Spoken Document Retrieval and Transcription , 2009, TALIP.

[10]  Wang Longbiao,et al.  LVCSR based on Context-Dependent Syllable Acoustic Models , 2008 .

[11]  Masatoshi Tsuchiya,et al.  Context dependent class language model based on word co-occurrence matrix in LSA framework for speech recognition , 2008 .

[12]  Helmut Schmidt,et al.  Probabilistic part-of-speech tagging using decision trees , 1994 .

[13]  Thomas Hofmann,et al.  Probabilistic Latent Semantic Analysis , 1999, UAI.

[14]  Jeff A. Bilmes,et al.  Factored Language Models and Generalized Parallel Backoff , 2003, NAACL.

[15]  Seiichi Nakagawa,et al.  Relationship among phoneme/word recognition rate, perplexity and sentence recognition and comparison of language models , 1992, [Proceedings] ICASSP-92: 1992 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[16]  Inderjit S. Dhillon,et al.  Efficient Clustering of Very Large Document Collections , 2001 .

[17]  Mari Ostendorf,et al.  Modeling long distance dependence in language: topic mixtures versus dynamic cache models , 1996, IEEE Trans. Speech Audio Process..

[18]  Robert L. Mercer,et al.  Class-Based n-gram Models of Natural Language , 1992, CL.

[19]  Feifan Liu,et al.  Unsupervised Language Model Adaptation Incorporating Named Entity Information , 2007, ACL.

[20]  Feifan Liu,et al.  Unsupervised language model adaptation via topic modeling based on named entity hypotheses , 2008, 2008 IEEE International Conference on Acoustics, Speech and Signal Processing.

[21]  Mari Ostendorf,et al.  Language Modeling with Sentence-Level Mixtures , 1994, HLT.

[22]  Helmer Strik,et al.  Comparing the performance of two CSRs: how to determine the significance level of the differences , 2001, INTERSPEECH.

[23]  Thomas Hofmann,et al.  Topic-based language models using EM , 1999, EUROSPEECH.

[24]  Renato De Mori,et al.  A Cache-Based Natural Language Model for Speech Recognition , 1990, IEEE Trans. Pattern Anal. Mach. Intell..

[25]  Mikko Kurimo,et al.  Methods for combining language models in speech recognition , 2005, INTERSPEECH.

[26]  Dietrich Klakow,et al.  Testing the correlation of word error rate and perplexity , 2002, Speech Commun..

[27]  Ronald Rosenfeld,et al.  A maximum entropy approach to adaptive statistical language modelling , 1996, Comput. Speech Lang..

[28]  Mari Ostendorf,et al.  Modeling long distance dependence in language: topic mixtures vs. dynamic cache models , 1996, Proceeding of Fourth International Conference on Spoken Language Processing. ICSLP '96.

[29]  Frederick Jelinek,et al.  Interpolated estimation of Markov source parameters from sparse data , 1980 .

[30]  Andreas Stolcke,et al.  SRILM - an extensible language modeling toolkit , 2002, INTERSPEECH.

[31]  N. H. Beebe A Complete Bibliography of ACM Transactions on Asian Language Information Processing , 2007 .

[32]  Erkki Sutinen,et al.  Comparison of Dimension Reduction Methods for Automated Essay Grading , 2008, J. Educ. Technol. Soc..

[33]  Michael I. Jordan,et al.  Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..