论文信息 - When and why are log-linear models self-normalizing?

When and why are log-linear models self-normalizing?

Several techniques have recently been proposed for training “self-normalized” discriminative models. These attempt to find parameter settings for which unnormalized model scores approximate the true label probability. However, the theoretical properties of such techniques (and of self-normalization generally) have not been investigated. This paper examines the conditions under which we can expect self-normalization to work. We characterize a general class of distributions that admit self-normalization, and prove generalization bounds for procedures that minimize empirical normalizer variance. Motivated by these results, we describe a novel variant of an established procedure for training self-normalized models. The new procedure avoids computing normalizers for most training examples, and decreases training time by as much as factor of ten while preserving model quality.

Dan Klein | Jacob Andreas | D. Klein | Jacob Andreas

[1] Ronald Rosenfeld,et al. Adaptive Statistical Language Modeling; A Maximum Entropy Approach , 1994 .

[2] Peter L. Bartlett,et al. Neural Network Learning - Theoretical Foundations , 1999 .

[3] Yoshua Bengio,et al. Neural Probabilistic Language Models , 2006 .

[4] Yoram Singer,et al. Adaptive Subgradient Methods for Online Learning and Stochastic Optimization , 2011, J. Mach. Learn. Res..

[5] Aapo Hyvärinen,et al. Noise-contrastive estimation: A new estimation principle for unnormalized statistical models , 2010, AISTATS.

[6] Navdeep Jaitly,et al. Hybrid speech recognition with Deep Bidirectional LSTM , 2013, 2013 IEEE Workshop on Automatic Speech Recognition and Understanding.

[7] Ashish Vaswani,et al. Decoding with Large-Scale Neural Language Models Improves Translation , 2013, EMNLP.

[8] Richard M. Schwartz,et al. Fast and Robust Neural Network Joint Models for Statistical Machine Translation , 2014, ACL.

[9] Brian Roark,et al. Backoff inspired features for maximum entropy language models , 2014, INTERSPEECH.