Estimating Marginal Probabilities of n-grams for Recurrent Neural Language Models

Recurrent neural network language models (RNNLMs) are the current standard-bearer for statistical language modeling. However, RNNLMs only estimate probabilities for complete sequences of text, whereas some applications require context-independent phrase probabilities instead. In this paper, we study how to compute an RNNLM’s em marginal probability: the probability that the model assigns to a short sequence of text when the preceding context is not known. We introduce a simple method of altering the RNNLM training to make the model more accurate at marginal estimation. Our experiments demonstrate that the technique is effective compared to baselines including the traditional RNNLM probability and an importance sampling approach. Finally, we show how we can use the marginal estimation to improve an RNNLM by training the marginals to match n-gram probabilities from a larger corpus.

[1]  Jeffrey L. Elman,et al.  Finding Structure in Time , 1990, Cogn. Sci..

[2]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[3]  F ChenStanley,et al.  An Empirical Study of Smoothing Techniques for Language Modeling , 1996, ACL.

[4]  Oren Etzioni,et al.  The use of web-based statistics to validate, information extraction , 2004, AAAI 2004.

[5]  Lukás Burget,et al.  Recurrent neural network based language model , 2010, INTERSPEECH.

[6]  Yoshua Bengio,et al.  On the Properties of Neural Machine Translation: Encoder–Decoder Approaches , 2014, SSST@EMNLP.

[7]  Chandra Bhagavatula,et al.  TextJoiner : On-demand Information Extraction with Multi-Pattern Queries , 2014 .

[8]  Wojciech Zaremba,et al.  Recurrent Neural Network Regularization , 2014, ArXiv.

[9]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[10]  Yoshua Bengio,et al.  Neural Machine Translation by Jointly Learning to Align and Translate , 2014, ICLR.

[11]  Zoubin Ghahramani,et al.  A Theoretically Grounded Application of Dropout in Recurrent Neural Networks , 2015, NIPS.

[12]  Yonghui Wu,et al.  Exploring the Limits of Language Modeling , 2016, ArXiv.

[13]  Hugo Larochelle,et al.  Neural Autoregressive Distribution Estimation , 2016, J. Mach. Learn. Res..

[14]  Hakan Inan,et al.  Tying Word Vectors and Word Classifiers: A Loss Framework for Language Modeling , 2016, ICLR.

[15]  Christopher D. Manning,et al.  Get To The Point: Summarization with Pointer-Generator Networks , 2017, ACL.

[16]  Tsung-Hsien Wen,et al.  Latent Intention Dialogue Models , 2017, ICML.

[17]  Samy Bengio,et al.  N-gram Language Modeling using Recurrent Neural Network Estimation , 2017, ArXiv.

[18]  Richard Socher,et al.  Pointer Sentinel Mixture Models , 2016, ICLR.

[19]  Doug Downey,et al.  Controlling Global Statistics in Recurrent Neural Network Text Generation , 2018, AAAI.

[20]  Chris Dyer,et al.  On the State of the Art of Evaluation in Neural Language Models , 2017, ICLR.