论文信息 - Improving Neural Language Models with a Continuous Cache

Improving Neural Language Models with a Continuous Cache

We propose an extension to neural network language models to adapt their prediction to the recent history. Our model is a simplified version of memory augmented networks, which stores past hidden activations as memory and accesses them through a dot product with the current hidden activation. This mechanism is very efficient and scales to very large memory sizes. We also draw a link between the use of external memory in neural network and cache models used with count based language models. We demonstrate on several language model datasets that our approach performs significantly better than recent memory augmented networks.

[1] Lalit R. Bahl,et al. A Maximum Likelihood Approach to Continuous Speech Recognition , 1983, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[2] Slava M. Katz,et al. Estimation of probabilities from sparse data for the language model component of a speech recognizer , 1987, IEEE Trans. Acoust. Speech Signal Process..

[3] Roland Kuhn,et al. Speech Recognition and the Frequency of Recently Used Words: A Modified Markov Model for Natural Language , 1988, COLING.

[4] Julian Kupiec,et al. Probabilistic Models of Short and Long Distance Word Dependencies in Running Text , 1989, HLT.

[5] Paul J. Werbos,et al. Backpropagation Through Time: What It Does and How to Do It , 1990, Proc. IEEE.

[6] Jing Peng,et al. An Efficient Gradient-Based Algorithm for On-Line Training of Recurrent Network Trajectories , 1990, Neural Computation.

[7] Jeffrey L. Elman,et al. Finding Structure in Time , 1990, Cogn. Sci..

[8] Renato De Mori,et al. A Cache-Based Natural Language Model for Speech Recognition , 1990, IEEE Trans. Pattern Anal. Mach. Intell..

[9] Bernard Mérialdo,et al. A Dynamic Language Model for Speech Recognition , 1991, HLT.

[10] Robert L. Mercer,et al. Adaptive Language Modeling Using Minimum Discriminant Estimation , 1992, HLT.

[11] Beatrice Santorini,et al. Building a Large Annotated Corpus of English: The Penn Treebank , 1993, CL.

[12] Reinhard Kneser,et al. On the dynamic adaptation of stochastic language models , 1993, 1993 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[13] Ronald Rosenfeld,et al. Trigger-based language models: a maximum entropy approach , 1993, 1993 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[14] Robert L. Mercer,et al. The Mathematics of Statistical Machine Translation: Parameter Estimation , 1993, CL.

[15] Hermann Ney,et al. Improved backing-off for M-gram language modeling , 1995, 1995 International Conference on Acoustics, Speech, and Signal Processing.

[16] Ronald Rosenfeld,et al. A maximum entropy approach to adaptive statistical language modelling , 1996, Comput. Speech Lang..

[17] Mari Ostendorf,et al. Modeling long distance dependence in language: topic mixtures versus dynamic cache models , 1996, IEEE Trans. Speech Audio Process..

[18] Jürgen Schmidhuber,et al. Long Short-Term Memory , 1997, Neural Computation.

[19] Daniel Jurafsky,et al. Towards better integration of semantic predictors in statistical language modeling , 1998, ICSLP.

[20] Andreas Stolcke,et al. Dialogue act modeling for automatic tagging and recognition of conversational speech , 2000, CL.

[21] Yoshua Bengio,et al. A Neural Probabilistic Language Model , 2003, J. Mach. Learn. Res..

[22] Jun Wu,et al. Maximum entropy techniques for exploiting syntactic, semantic and collocational dependencies in language modeling , 2000, Comput. Speech Lang..

[23] J.R. Bellegarda,et al. Exploiting latent semantic information in statistical language modeling , 2000, Proceedings of the IEEE.

[24] Joshua Goodman,et al. A bit of progress in language modeling , 2001, Comput. Speech Lang..

[25] Yoram Singer,et al. Adaptive Subgradient Methods for Online Learning and Stochastic Optimization , 2011, J. Mach. Learn. Res..

[26] Lukás Burget,et al. Recurrent neural network based language model , 2010, INTERSPEECH.

[27] Lukás Burget,et al. Empirical Evaluation and Combination of Advanced Language Modeling Techniques , 2011, INTERSPEECH.