Use of latent words language models in ASR: A sampling-based implementation

This paper applies the latent words language model (LWLM) to automatic speech recognition (ASR). LWLMs are trained taking into account related words, i.e., grouping of similar words in terms of meaning and syntactic role. This means, for example, if a technical word and a general word play a similar syntactic role, they are given a similar probability. This is expected that the LWLM performs robustly over multiple domains. Furthermore, we can expect that the interpolation of the LWLM and a standard n-gram LM will be effective since each of the LMs have different learning criterion. In addition, this paper also describes an approximation method of the LWLM for ASR, in which words are randomly sampled on the LWLM and then a standard word n-gram language model is trained. This enables us one-pass decoding. Our experimental results show that the LWLM performs comparable to the hierarchical Pitman-Yor language model (HPYLM) in a target domain task, and more robustly performs in out-domain tasks. Moreover, an interpolation model with the HPYLM provides a lower word error rate in all the tasks.

[1]  Frederick Jelinek,et al.  A study of n-gram and decision tree letter language modeling methods , 1998, Speech Commun..

[2]  Sanjeev Khudanpur,et al.  Variational approximation of long-span language models for lvcsr , 2011, 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[3]  Atsushi Nakamura,et al.  Efficient WFST-Based One-Pass Decoding With On-The-Fly Hypothesis Rescoring in Extremely Large Vocabulary Continuous Speech Recognition , 2007, IEEE Transactions on Audio, Speech, and Language Processing.

[4]  Marie-Francine Moens,et al.  The latent words language model , 2012, Comput. Speech Lang..

[5]  Yoshua Bengio,et al.  A Neural Probabilistic Language Model , 2003, J. Mach. Learn. Res..

[6]  Hitoshi Isahara,et al.  Spontaneous Speech Corpus of Japanese , 2000, LREC.

[7]  Lukás Burget,et al.  Recurrent Neural Network Based Language Modeling in Meeting Recognition , 2011, INTERSPEECH.

[8]  Shinichiro Takagi,et al.  Japanese Morphological Analyzer using Word Co-occurence -JTAG , 1998, COLING-ACL.

[9]  Robert L. Mercer,et al.  Class-Based n-gram Models of Natural Language , 1992, CL.

[10]  F ChenStanley,et al.  An Empirical Study of Smoothing Techniques for Language Modeling , 1996, ACL.

[11]  Steve Renals,et al.  Hierarchical Pitman-Yor language models for ASR in meetings , 2007, 2007 IEEE Workshop on Automatic Speech Recognition & Understanding (ASRU).

[12]  Yee Whye Teh,et al.  A Hierarchical Bayesian Language Model Based On Pitman-Yor Processes , 2006, ACL.

[13]  David J. C. MacKay,et al.  A hierarchical Dirichlet language model , 1995, Natural Language Engineering.

[14]  Peng Xu,et al.  Random Forests in Language Modelin , 2004, EMNLP.

[15]  Hermann Ney,et al.  Improved backing-off for M-gram language modeling , 1995, 1995 International Conference on Acoustics, Speech, and Signal Processing.

[16]  Kikuo Maekawa Compilation of the Balanced Corpus of Contemporary Written Japanese in the KOTONOHA Initiative (Invited Paper) , 2008, 2008 Second International Symposium on Universal Communication.

[17]  Joshua Goodman,et al.  A bit of progress in language modeling , 2001, Comput. Speech Lang..

[18]  Lukás Burget,et al.  Empirical Evaluation and Combination of Advanced Language Modeling Techniques , 2011, INTERSPEECH.