Hierarchical Latent Words Language Models for Robust Modeling to Out-Of Domain Tasks

This paper focuses on language modeling with adequate robustness to support different domain tasks. To this end, we propose a hierarchical latent word language model (h-LWLM). The proposed model can be regarded as a generalized form of the standard LWLMs. The key advance is introducing a multiple latent variable space with hierarchical structure. The structure can flexibly take account of linguistic phenomena not present in the training data. This paper details the definition as well as a training method based on layer-wise inference and a practical usage in natural language processing tasks with an approximation technique. Experiments on speech recognition show the effectiveness of hLWLM in out-of domain tasks.

[1]  David J. C. MacKay,et al.  A hierarchical Dirichlet language model , 1995, Natural Language Engineering.

[2]  Satoshi Takahashi,et al.  Viterbi decoding for latent words language models using gibbs sampling , 2013, INTERSPEECH.

[3]  Lukás Burget,et al.  Recurrent neural network based language model , 2010, INTERSPEECH.

[4]  Joshua Goodman,et al.  A bit of progress in language modeling , 2001, Comput. Speech Lang..

[5]  C. Robert,et al.  Bayesian estimation of hidden Markov chains: a stochastic implementation , 1993 .

[6]  Ahmad Emami,et al.  Random clusterings for language modeling , 2005, Proceedings. (ICASSP '05). IEEE International Conference on Acoustics, Speech, and Signal Processing, 2005..

[7]  R. Rosenfeld,et al.  Two decades of statistical language modeling: where do we go from here? , 2000, Proceedings of the IEEE.

[8]  Tara N. Sainath,et al.  Deep Neural Networks for Acoustic Modeling in Speech Recognition , 2012 .

[9]  F ChenStanley,et al.  An Empirical Study of Smoothing Techniques for Language Modeling , 1996, ACL.

[10]  Peng Xu,et al.  Random Forests in Language Modelin , 2004, EMNLP.

[11]  S. L. Scott Bayesian Methods for Hidden Markov Models , 2002 .

[12]  Hitoshi Isahara,et al.  Spontaneous Speech Corpus of Japanese , 2000, LREC.

[13]  Lukás Burget,et al.  Extensions of recurrent neural network language model , 2011, 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[14]  Atsushi Nakamura,et al.  Efficient WFST-Based One-Pass Decoding With On-The-Fly Hypothesis Rescoring in Extremely Large Vocabulary Continuous Speech Recognition , 2007, IEEE Transactions on Audio, Speech, and Language Processing.

[15]  Marie-Francine Moens,et al.  The latent words language model , 2012, Comput. Speech Lang..

[16]  Steve Renals,et al.  Hierarchical Pitman-Yor language models for ASR in meetings , 2007, 2007 IEEE Workshop on Automatic Speech Recognition & Understanding (ASRU).

[17]  G. Casella,et al.  Explaining the Gibbs Sampler , 1992 .

[18]  Satoshi Takahashi,et al.  Use of latent words language models in ASR: A sampling-based implementation , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[19]  Robert L. Mercer,et al.  Class-Based n-gram Models of Natural Language , 1992, CL.

[20]  Hermann Ney,et al.  Improved backing-off for M-gram language modeling , 1995, 1995 International Conference on Acoustics, Speech, and Signal Processing.

[21]  Fernando Pereira,et al.  Weighted finite-state transducers in speech recognition , 2002, Comput. Speech Lang..

[22]  Yee Whye Teh,et al.  A Fast Learning Algorithm for Deep Belief Nets , 2006, Neural Computation.

[23]  Geoffrey E. Hinton,et al.  Deep Boltzmann Machines , 2009, AISTATS.

[24]  Yoshua Bengio,et al.  A Neural Probabilistic Language Model , 2003, J. Mach. Learn. Res..

[25]  Thorsten Brants,et al.  Large Language Models in Machine Translation , 2007, EMNLP.

[26]  Yee Whye Teh,et al.  A Hierarchical Bayesian Language Model Based On Pitman-Yor Processes , 2006, ACL.