Towards Competitive N-gram Smoothing
暂无分享,去创建一个
Alon Orlitsky | Mesrob I. Ohannessian | Venkatadheeraj Pichapati | Moein Falahatgar | A. Orlitsky | Moein Falahatgar | Venkatadheeraj Pichapati
[1] Wojciech Zaremba,et al. Recurrent Neural Network Regularization , 2014, ArXiv.
[2] Eric P. Xing,et al. Language Modeling with Power Low Rank Ensembles , 2013, EMNLP.
[3] Ian R. Lane,et al. Neural network language models for low resource languages , 2014, INTERSPEECH.
[4] Jeffrey Dean,et al. Efficient Estimation of Word Representations in Vector Space , 2013, ICLR.
[5] F ChenStanley,et al. An Empirical Study of Smoothing Techniques for Language Modeling , 1996, ACL.
[6] Dietrich Braess,et al. Bernstein polynomials and learning theory , 2004, J. Approx. Theory.
[7] Jeffrey Dean,et al. Distributed Representations of Words and Phrases and their Compositionality , 2013, NIPS.
[8] Liam Paninski. Variational Minimax Estimation of Discrete Distributions under KL Loss , 2004, NIPS.
[9] Alon Orlitsky,et al. The power of absolute discounting: all-dimensional distribution estimation , 2017, NIPS.
[10] Richard Socher,et al. Regularizing and Optimizing LSTM Language Models , 2017, ICLR.
[11] Alon Orlitsky,et al. On Learning Distributions from their Samples , 2015, COLT.
[12] Lukás Burget,et al. Recurrent neural network based language model , 2010, INTERSPEECH.
[13] Alon Orlitsky,et al. Competitive Distribution Estimation: Why is Good-Turing Good , 2015, NIPS.
[14] Mari Ostendorf,et al. A Sparse Plus Low-Rank Exponential Language Model for Limited Resource Scenarios , 2015, IEEE/ACM Transactions on Audio, Speech, and Language Processing.
[15] Alon Orlitsky,et al. Near-Optimal Smoothing of Structured Conditional Probability Matrices , 2016, NIPS.
[16] Samy Bengio,et al. N-gram Language Modeling using Recurrent Neural Network Estimation , 2017, ArXiv.
[17] Masaaki Nagata,et al. Direct Output Connection for a High-Rank Language Model , 2018, EMNLP.
[18] Vysoké Učení,et al. Statistical Language Models Based on Neural Networks , 2012 .
[19] Mesrob I. Ohannessian,et al. Concentration inequalities in the infinite urn scheme for occupancy counts and the missing mass, with applications , 2014, 1412.8652.
[20] Yee Whye Teh,et al. A Hierarchical Bayesian Language Model Based On Pitman-Yor Processes , 2006, ACL.
[21] Ruslan Salakhutdinov,et al. Breaking the Softmax Bottleneck: A High-Rank RNN Language Model , 2017, ICLR.
[22] A. Shapiro,et al. National Consortium for the Study of Terrorism and Responses to Terrorism , 2010 .
[23] Di He,et al. FRAGE: Frequency-Agnostic Word Representation , 2018, NeurIPS.
[24] Hermann Ney,et al. Improved backing-off for M-gram language modeling , 1995, 1995 International Conference on Acoustics, Speech, and Signal Processing.
[25] Yiming Yang,et al. Transformer-XL: Attentive Language Models beyond a Fixed-Length Context , 2019, ACL.
[26] Daniel Jurafsky,et al. Data Noising as Smoothing in Neural Network Language Models , 2017, ICLR.
[27] Gregory Valiant,et al. Instance Optimal Learning , 2015, ArXiv.
[28] I. Good. THE POPULATION FREQUENCIES OF SPECIES AND THE ESTIMATION OF POPULATION PARAMETERS , 1953 .
[29] Munther A. Dahleh,et al. Rare Probability Estimation under Regularly Varying Heavy Tails , 2012, COLT.