Dropout as a Low-Rank Regularizer for Matrix Factorization

Regularization for matrix factorization (MF) and approximation problems has been carried out in many different ways. Due to its popularity in deep learning, dropout has been applied also for this class of problems. Despite its solid empirical performance, the theoretical properties of dropout as a regularizer remain quite elusive for this class of problems. In this paper, we present a theoretical analysis of dropout for MF, where Bernoulli random variables are used to drop columns of the factors. We demonstrate the equivalence between dropout and a fully deterministic model for MF in which the factors are regularized by the sum of the product of squared Euclidean norms of the columns. Additionally, we inspect the case of a variable sized factorization and we prove that dropout achieves the global minimum of a convex approximation problem with (squared) nuclear norm regularization. As a result, we conclude that dropout can be used as a low-rank regularizer with data dependent singular-value thresholding.

[1]  René Vidal,et al.  Structured Low-Rank Matrix Factorization: Global Optimality, Algorithms, and Applications , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[2]  René Vidal,et al.  Global Optimality in Neural Network Training , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[3]  Stefano Soatto,et al.  Information Dropout: learning optimal representations through noise , 2017, ArXiv.

[4]  René Vidal,et al.  Curriculum Dropout , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[5]  Yalou Huang,et al.  Dropout Non-negative Matrix Factorization for Independent Feature Learning , 2016, NLPCC/ICCPOL.

[6]  Stefano Soatto,et al.  Information Dropout: Learning Optimal Representations Through Noisy Computation , 2016, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[7]  S. Shankar Sastry,et al.  Generalized Principal Component Analysis , 2016, Interdisciplinary applied mathematics.

[8]  Martín Abadi,et al.  TensorFlow: Large-Scale Machine Learning on Heterogeneous Distributed Systems , 2016, ArXiv.

[9]  Tianbao Yang,et al.  Improved Dropout for Shallow and Deep Learning , 2016, NIPS.

[10]  Zhongfei Zhang,et al.  Dropout Training of Matrix Factorization and Autoencoder for Link Prediction in Sparse Graphs , 2015, SDM.

[11]  Xiaodong Gu,et al.  Towards dropout training for convolutional neural networks , 2015, Neural Networks.

[12]  René Vidal,et al.  Global Optimality in Tensor Factorization, Deep Learning, and Beyond , 2015, ArXiv.

[13]  Zoubin Ghahramani,et al.  Dropout as a Bayesian Approximation: Representing Model Uncertainty in Deep Learning , 2015, ICML.

[14]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[15]  Philip M. Long,et al.  On the inductive bias of dropout , 2014, J. Mach. Learn. Res..

[16]  Vaibhava Goel,et al.  Annealed dropout training of deep networks , 2014, 2014 IEEE Spoken Language Technology Workshop (SLT).

[17]  Sida I. Wang,et al.  Altitude Training: Strong Bounds for Single-Layer Dropout , 2014, NIPS.

[18]  René Vidal,et al.  Low rank subspace clustering (LRSC) , 2014, Pattern Recognit. Lett..

[19]  René Vidal,et al.  Structured Low-Rank Matrix Factorization: Optimality, Algorithm, and Applications to Image Processing , 2014, ICML.

[20]  Peder A. Olsen,et al.  Nuclear Norm Minimization via Active Subspace Selection , 2014, ICML.

[21]  Pierre Baldi,et al.  The dropout learning algorithm , 2014, Artif. Intell..

[22]  Pierre Baldi,et al.  Understanding Dropout , 2013, NIPS.

[23]  Brendan J. Frey,et al.  Adaptive dropout for training deep neural networks , 2013, NIPS.

[24]  Christian Osendorfer,et al.  On Fast Dropout and its Applicability to Recurrent Networks , 2013, ICLR.

[25]  Francis R. Bach,et al.  Convex relaxations of structured matrix factorizations , 2013, ArXiv.

[26]  Sida I. Wang,et al.  Dropout Training as Adaptive Regularization , 2013, NIPS.

[27]  Alessio Del Bue,et al.  Bilinear Modeling via Augmented Lagrange Multipliers (BALM) , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[28]  Geoffrey E. Hinton,et al.  Improving neural networks by preventing co-adaptation of feature detectors , 2012, ArXiv.

[29]  Alexandre Bernardino,et al.  Matrix Completion for Multi-label Image Classification , 2011, NIPS.

[30]  Haiping Lu,et al.  A survey of multilinear subspace learning for tensor data , 2011, Pattern Recognit..

[31]  Pascal Vincent,et al.  Adding noise to the input of a model trained with a regularized objective , 2011, ArXiv.

[32]  Jason Weston,et al.  Curriculum learning , 2009, ICML '09.

[33]  Emmanuel J. Candès,et al.  The Power of Convex Relaxation: Near-Optimal Matrix Completion , 2009, IEEE Transactions on Information Theory.

[34]  Jean Ponce,et al.  Convex Sparse Matrix Factorizations , 2008, ArXiv.

[35]  Massimiliano Pontil,et al.  Convex multi-task feature learning , 2008, Machine Learning.

[36]  Emmanuel J. Candès,et al.  Exact Matrix Completion via Convex Optimization , 2008, Found. Comput. Math..

[37]  Pablo A. Parrilo,et al.  Guaranteed Minimum-Rank Solutions of Linear Matrix Equations via Nuclear Norm Minimization , 2007, SIAM Rev..

[38]  M. Yuan,et al.  Dimension reduction and coefficient estimation in multivariate linear regression , 2007 .

[39]  Geoffrey E. Hinton,et al.  Reducing the Dimensionality of Data with Neural Networks , 2006, Science.

[40]  Nathan Srebro,et al.  Fast maximum margin matrix factorization for collaborative prediction , 2005, ICML.

[41]  Tommi S. Jaakkola,et al.  Maximum-Margin Matrix Factorization , 2004, NIPS.

[42]  Christopher M. Bishop,et al.  Current address: Microsoft Research, , 2022 .

[43]  Nitish Srivastava,et al.  Dropout: a simple way to prevent neural networks from overfitting , 2014, J. Mach. Learn. Res..