论文信息 - On the Implicit Bias of Dropout - 字舞流文

On the Implicit Bias of Dropout

Algorithmic approaches endow deep learning systems with implicit bias that helps them generalize even in over-parametrized settings. In this paper, we focus on understanding such a bias induced in learning through dropout, a popular technique to avoid overfitting in deep learning. For single hidden-layer linear neural networks, we show that dropout tends to make the norm of incoming/outgoing weight vectors of all the hidden nodes equal. In addition, we provide a complete characterization of the optimization landscape induced by dropout.

Raman Arora | René Vidal | Poorya Mianjy | R. Vidal | R. Arora | Poorya Mianjy

[1] A. Albert,et al. On matrices of trace zeros. , 1957 .

[2] O. Taussky. Matrices with Trace Zero , 1962 .

[3] Kurt Hornik,et al. Neural networks and principal component analysis: Learning from examples without local minima , 1989, Neural Networks.

[4] Geoffrey E. Hinton,et al. Simplifying Neural Networks by Soft Weight-Sharing , 1992, Neural Computation.

[5] Rich Caruana,et al. Overfitting in Neural Nets: Backpropagation, Conjugate Gradient, and Early Stopping , 2000, NIPS.

[6] Tommi S. Jaakkola,et al. Maximum-Margin Matrix Factorization , 2004, NIPS.

[7] Yoshua Bengio,et al. Deep Sparse Rectifier Neural Networks , 2011, AISTATS.

[8] Pierre Baldi,et al. Understanding Dropout , 2013, NIPS.

[9] Sida I. Wang,et al. Dropout Training as Adaptive Regularization , 2013, NIPS.

[10] Nitish Srivastava,et al. Dropout: a simple way to prevent neural networks from overfitting , 2014, J. Mach. Learn. Res..

[11] Ryota Tomioka,et al. Norm-Based Capacity Control in Neural Networks , 2015, COLT.

[12] Philip M. Long,et al. On the inductive bias of dropout , 2014, J. Mach. Learn. Res..

[13] Zhongfei Zhang,et al. Dropout Training of Matrix Factorization and Autoencoder for Link Prediction in Sparse Graphs , 2015, SDM.

[14] Sergey Ioffe,et al. Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift , 2015, ICML.

[15] Furong Huang,et al. Escaping From Saddle Points - Online Stochastic Gradient for Tensor Decomposition , 2015, COLT.

[16] John Wright,et al. A Geometric Analysis of Phase Retrieval , 2016, 2016 IEEE International Symposium on Information Theory (ISIT).

[17] Kenji Kawaguchi,et al. Deep Learning without Poor Local Minima , 2016, NIPS.

[18] Yalou Huang,et al. Dropout Non-negative Matrix Factorization for Independent Feature Learning , 2016, NLPCC/ICCPOL.

[19] Nathan Srebro,et al. Global Optimality of Local Search for Low Rank Matrix Recovery , 2016, NIPS.

[20] Michael I. Jordan,et al. Gradient Descent Converges to Minimizers , 2016, ArXiv.

[21] Tengyu Ma,et al. Matrix Completion has No Spurious Local Minimum , 2016, NIPS.

[22] Zoubin Ghahramani,et al. Dropout as a Bayesian Approximation: Representing Model Uncertainty in Deep Learning , 2015, ICML.

[23] Yi Zheng,et al. No Spurious Local Minima in Nonconvex Low Rank Problems: A Unified Geometric Analysis , 2017, ICML.

[24] Raman Arora,et al. Understanding Deep Neural Networks with Rectified Linear Units , 2016, Electron. Colloquium Comput. Complex..

[25] Michael I. Jordan,et al. How to Escape Saddle Points Efficiently , 2017, ICML.

[26] René Vidal,et al. Dropout as a Low-Rank Regularizer for Matrix Factorization , 2017, AISTATS.

[27] René Vidal,et al. Structured Low-Rank Matrix Factorization: Global Optimality, Algorithms, and Applications , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.