论文信息 - Hierarchical Latent Words Language Models for Robust Modeling to Out-Of Domain Tasks

Hierarchical Latent Words Language Models for Robust Modeling to Out-Of Domain Tasks

This paper focuses on language modeling with adequate robustness to support different domain tasks. To this end, we propose a hierarchical latent word language model (h-LWLM). The proposed model can be regarded as a generalized form of the standard LWLMs. The key advance is introducing a multiple latent variable space with hierarchical structure. The structure can flexibly take account of linguistic phenomena not present in the training data. This paper details the definition as well as a training method based on layer-wise inference and a practical usage in natural language processing tasks with an approximation technique. Experiments on speech recognition show the effectiveness of hLWLM in out-of domain tasks.

[1] David J. C. MacKay,et al. A hierarchical Dirichlet language model , 1995, Natural Language Engineering.

[2] Satoshi Takahashi,et al. Viterbi decoding for latent words language models using gibbs sampling , 2013, INTERSPEECH.

[3] Lukás Burget,et al. Recurrent neural network based language model , 2010, INTERSPEECH.

[4] Joshua Goodman,et al. A bit of progress in language modeling , 2001, Comput. Speech Lang..

[5] C. Robert,et al. Bayesian estimation of hidden Markov chains: a stochastic implementation , 1993 .

[6] Ahmad Emami,et al. Random clusterings for language modeling , 2005, Proceedings. (ICASSP '05). IEEE International Conference on Acoustics, Speech, and Signal Processing, 2005..

[7] R. Rosenfeld,et al. Two decades of statistical language modeling: where do we go from here? , 2000, Proceedings of the IEEE.

[8] Tara N. Sainath,et al. Deep Neural Networks for Acoustic Modeling in Speech Recognition , 2012 .

[9] F ChenStanley,et al. An Empirical Study of Smoothing Techniques for Language Modeling , 1996, ACL.

[10] Peng Xu,et al. Random Forests in Language Modelin , 2004, EMNLP.

[11] S. L. Scott. Bayesian Methods for Hidden Markov Models , 2002 .

[12] Hitoshi Isahara,et al. Spontaneous Speech Corpus of Japanese , 2000, LREC.

[13] Lukás Burget,et al. Extensions of recurrent neural network language model , 2011, 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[14] Atsushi Nakamura,et al. Efficient WFST-Based One-Pass Decoding With On-The-Fly Hypothesis Rescoring in Extremely Large Vocabulary Continuous Speech Recognition , 2007, IEEE Transactions on Audio, Speech, and Language Processing.

[15] Marie-Francine Moens,et al. The latent words language model , 2012, Comput. Speech Lang..

[16] Steve Renals,et al. Hierarchical Pitman-Yor language models for ASR in meetings , 2007, 2007 IEEE Workshop on Automatic Speech Recognition & Understanding (ASRU).

[17] G. Casella,et al. Explaining the Gibbs Sampler , 1992 .

[18] Satoshi Takahashi,et al. Use of latent words language models in ASR: A sampling-based implementation , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[19] Robert L. Mercer,et al. Class-Based n-gram Models of Natural Language , 1992, CL.

[20] Hermann Ney,et al. Improved backing-off for M-gram language modeling , 1995, 1995 International Conference on Acoustics, Speech, and Signal Processing.

[21] Fernando Pereira,et al. Weighted finite-state transducers in speech recognition , 2002, Comput. Speech Lang..

[22] Yee Whye Teh,et al. A Fast Learning Algorithm for Deep Belief Nets , 2006, Neural Computation.

[23] Geoffrey E. Hinton,et al. Deep Boltzmann Machines , 2009, AISTATS.

[24] Yoshua Bengio,et al. A Neural Probabilistic Language Model , 2003, J. Mach. Learn. Res..

[25] Thorsten Brants,et al. Large Language Models in Machine Translation , 2007, EMNLP.

[26] Yee Whye Teh,et al. A Hierarchical Bayesian Language Model Based On Pitman-Yor Processes , 2006, ACL.