When and why are log-linear models self-normalizing?
暂无分享,去创建一个
[1] Ronald Rosenfeld,et al. Adaptive Statistical Language Modeling; A Maximum Entropy Approach , 1994 .
[2] Peter L. Bartlett,et al. Neural Network Learning - Theoretical Foundations , 1999 .
[3] Yoshua Bengio,et al. Neural Probabilistic Language Models , 2006 .
[4] Yoram Singer,et al. Adaptive Subgradient Methods for Online Learning and Stochastic Optimization , 2011, J. Mach. Learn. Res..
[5] Aapo Hyvärinen,et al. Noise-contrastive estimation: A new estimation principle for unnormalized statistical models , 2010, AISTATS.
[6] Navdeep Jaitly,et al. Hybrid speech recognition with Deep Bidirectional LSTM , 2013, 2013 IEEE Workshop on Automatic Speech Recognition and Understanding.
[7] Ashish Vaswani,et al. Decoding with Large-Scale Neural Language Models Improves Translation , 2013, EMNLP.
[8] Richard M. Schwartz,et al. Fast and Robust Neural Network Joint Models for Statistical Machine Translation , 2014, ACL.
[9] Brian Roark,et al. Backoff inspired features for maximum entropy language models , 2014, INTERSPEECH.