论文信息 - DropMax: Adaptive Variational Softmax

DropMax: Adaptive Variational Softmax

We propose DropMax, a stochastic version of softmax classifier which at each iteration drops non-target classes according to dropout probabilities adaptively decided for each instance. Specifically, we overlay binary masking variables over class output probabilities, which are input-adaptively learned via variational inference. This stochastic regularization has an effect of building an ensemble classifier out of exponentially many classifiers with different decision boundaries. Moreover, the learning of dropout rates for non-target classes on each instance allows the classifier to focus more on classification against the most confusing classes. We validate our model on multiple public datasets for classification, on which it obtains significantly improved accuracy over the regular softmax classifier and other baselines. Further analysis of the learned dropout probabilities shows that our model indeed selects confusing classes more often when it performs classification.

[1] Max Welling,et al. Auto-Encoding Variational Bayes , 2013, ICLR.

[2] Pascal Vincent,et al. Dropout as data augmentation , 2015, ArXiv.

[3] Geoffrey E. Hinton,et al. ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[4] Ariel D. Procaccia,et al. Variational Dropout and the Local Reparameterization Trick , 2015, NIPS.

[5] Yee Whye Teh,et al. The Concrete Distribution: A Continuous Relaxation of Discrete Random Variables , 2016, ICLR.

[6] Martín Abadi,et al. TensorFlow: Large-Scale Machine Learning on Heterogeneous Distributed Systems , 2016, ArXiv.

[7] Kilian Q. Weinberger,et al. Densely Connected Convolutional Networks , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[8] Pietro Perona,et al. The Caltech-UCSD Birds-200-2011 Dataset , 2011 .

[9] Quoc V. Le,et al. Addressing the Rare Word Problem in Neural Machine Translation , 2014, ACL.

[10] Pascal Vincent,et al. An Exploration of Softmax Alternatives Belonging to the Spherical Loss Family , 2015, ICLR.

[11] Yoram Singer,et al. Train faster, generalize better: Stability of stochastic gradient descent , 2015, ICML.

[12] Honglak Lee,et al. Learning Structured Output Representation using Deep Conditional Generative Models , 2015, NIPS.

[13] Yoshua Bengio,et al. Show, Attend and Tell: Neural Image Caption Generation with Visual Attention , 2015, ICML.

[14] Zoubin Ghahramani,et al. A Theoretically Grounded Application of Dropout in Recurrent Neural Networks , 2015, NIPS.

[15] Alex Kendall,et al. Concrete Dropout , 2017, NIPS.

[16] André Elisseeff,et al. Stability and Generalization , 2002, J. Mach. Learn. Res..

[17] Zoubin Ghahramani,et al. Dropout as a Bayesian Approximation: Representing Model Uncertainty in Deep Learning , 2015, ICML.

[18] Alex Krizhevsky,et al. Learning Multiple Layers of Features from Tiny Images , 2009 .

[19] Brendan J. Frey,et al. Adaptive dropout for training deep neural networks , 2013, NIPS.