论文信息 - Adaptive Importance Sampling to Accelerate Training of a Neural Probabilistic Language Model

Adaptive Importance Sampling to Accelerate Training of a Neural Probabilistic Language Model

Previous work on statistical language modeling has shown that it is possible to train a feedforward neural network to approximate probabilities over sequences of words, resulting in significant error reduction when compared to standard baseline models based on n-grams. However, training the neural network model with the maximum-likelihood criterion requires computations proportional to the number of words in the vocabulary. In this paper, we introduce adaptive importance sampling as a way to accelerate training of the model. The idea is to use an adaptive n-gram model to track the conditional distributions produced by the neural network. We show that a very significant speedup can be obtained on standard problems.

Yoshua Bengio | Jean-Sébastien Senecal | Yoshua Bengio | Jean-Sébastien Senecal

[1] Frederick Jelinek,et al. Interpolated estimation of Markov source parameters from sparse data , 1980 .

[2] Jack Perkins,et al. Pattern recognition in practice , 1980 .

[3] Geoffrey E. Hinton,et al. Learning and relearning in Boltzmann machines , 1986 .

[4] Slava M. Katz,et al. Estimation of probabilities from sparse data for the language model component of a speech recognizer , 1987, IEEE Trans. Acoust. Speech Signal Process..

[5] Geoffrey E. Hinton,et al. Learning distributed representations of concepts. , 1989 .

[6] Jun S. Liu,et al. Sequential Imputations and Bayesian Missing Data Problems , 1994 .

[7] Hermann Ney,et al. Improved backing-off for M-gram language modeling , 1995, 1995 International Conference on Acoustics, Speech, and Signal Processing.

[8] Anders Krogh,et al. Improving Predicition of Protein Secondary Structure Using Structured Neural Networks and Multiple Sequence Alignments , 1996, J. Comput. Biol..

[9] Adam L. Berger,et al. A Maximum Entropy Approach to Natural Language Processing , 1996, CL.

[10] Jerome R. Bellegarda,et al. A latent semantic analysis framework for large-Span language modeling , 1997, EUROSPEECH.

[11] Hoon Kim,et al. Monte Carlo Statistical Methods , 2000, Technometrics.

[12] Jian Cheng,et al. AIS-BN: An Adaptive Importance Sampling Algorithm for Evidential Reasoning in Large Bayesian Networks , 2000, J. Artif. Intell. Res..

[13] Yoshua Bengio,et al. A Neural Probabilistic Language Model , 2003, J. Mach. Learn. Res..

[14] Leslie Pack Kaelbling,et al. Adaptive Importance Sampling for Estimation in Structured Domains , 2000, UAI.

[15] Wei Xu,et al. Can artificial neural networks learn language models? , 2000, INTERSPEECH.

[16] R. Rosenfeld,et al. Two decades of statistical language modeling: where do we go from here? , 2000, Proceedings of the IEEE.

[17] Søren Riis,et al. Self-organizing letter code-book for text-to-phoneme neural network model , 2000, INTERSPEECH.

[18] Joshua Goodman,et al. A bit of progress in language modeling , 2001, Comput. Speech Lang..

[19] Joshua Goodman,et al. A bit of progress in language modeling , 2001, Comput. Speech Lang..

[20] Geoffrey E. Hinton,et al. Learning Distributed Representations of Concepts Using Linear Relational Embedding , 2001, IEEE Trans. Knowl. Data Eng..

[21] Ronald Rosenfeld,et al. Whole-sentence exponential language models: a vehicle for linguistic-statistical integration , 2001, Comput. Speech Lang..

[22] Hinrich Schütze,et al. Book Reviews: Foundations of Statistical Natural Language Processing , 1999, CL.

[23] Tim Hesterberg,et al. Monte Carlo Strategies in Scientific Computing , 2002, Technometrics.

[24] Geoffrey E. Hinton. Training Products of Experts by Minimizing Contrastive Divergence , 2002, Neural Computation.

[25] Mary P. Harper,et al. The SuperARV Language Model: Investigating the Effectiveness of Tightly Integrating Multiple Knowledge Sources , 2002, EMNLP.

[26] Jean-Luc Gauvain,et al. Connectionist language modeling for large vocabulary continuous speech recognition , 2002, 2002 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[27] Ahmad Emami,et al. Training Connectionist Models for the Structured Language Model , 2003, EMNLP.

[28] Yee Whye Teh,et al. Energy-Based Models for Sparse Overcomplete Representations , 2003, J. Mach. Learn. Res..

[29] H. Schwenk,et al. Efficient training of large neural networks for language modeling , 2004, 2004 IEEE International Joint Conference on Neural Networks (IEEE Cat. No.04CH37541).

[30] Max Welling Donald,et al. Products of Experts , 2007 .