Fast Rates for Empirical Risk Minimization of Strict Saddle Problems

We derive bounds on the sample complexity of empirical risk minimization (ERM) in the context of minimizing non-convex risks that admit the strict saddle property. Recent progress in non-convex optimization has yielded efficient algorithms for minimizing such functions. Our results imply that these efficient algorithms are statistically stable and also generalize well. In particular, we derive fast rates which resemble the bounds that are often attained in the strongly convex setting. We specify our bounds to Principal Component Analysis and Independent Component Analysis. Our results and techniques may pave the way for statistical analyses of additional strict saddle problems.

[1]  Elad Hazan,et al.  Finding Local Minima for Nonconvex Optimization in Linear Time , 2016 .

[2]  Ohad Shamir,et al.  Learnability, Stability and Uniform Convergence , 2010, J. Mach. Learn. Res..

[3]  Nathan Srebro,et al.  Global Optimality of Local Search for Low Rank Matrix Recovery , 2016, NIPS.

[4]  Michael I. Jordan,et al.  Gradient Descent Converges to Minimizers , 2016, ArXiv.

[5]  Kfir Y. Levy,et al.  The Power of Normalization: Faster Evasion of Saddle Points , 2016, ArXiv.

[6]  Gilles Blanchard,et al.  Statistical properties of kernel principal component analysis , 2007, Machine Learning.

[7]  Shai Shalev-Shwartz,et al.  Tightening the Sample Complexity of Empirical Risk Minimization via Preconditioned Stability , 2016, ArXiv.

[8]  Yonina C. Eldar,et al.  Subspace Learning with Partial Information , 2014, J. Mach. Learn. Res..

[9]  Tengyu Ma,et al.  Finding approximate local minima faster than gradient descent , 2016, STOC.

[10]  Yurii Nesterov,et al.  Cubic regularization of Newton method and its global performance , 2006, Math. Program..

[11]  Hossein Mobahi,et al.  Homotopy Method for Tensor Principal Component Analysis , 2016, ArXiv.

[12]  Shai Ben-David,et al.  Understanding Machine Learning: From Theory to Algorithms , 2014 .

[13]  Shai Shalev-Shwartz,et al.  Average Stability is Invariant to Data Preconditioning. Implications to Exp-concave Empirical Risk Minimization , 2016, J. Mach. Learn. Res..

[14]  Tengyu Ma,et al.  Matrix Completion has No Spurious Local Minimum , 2016, NIPS.

[15]  Kfir Y. Levy,et al.  Fast Rates for Exp-concave Empirical Risk Minimization , 2015, NIPS.

[16]  Furong Huang,et al.  Escaping From Saddle Points - Online Stochastic Gradient for Tensor Decomposition , 2015, COLT.

[17]  Yoram Singer,et al.  Train faster, generalize better: Stability of stochastic gradient descent , 2015, ICML.

[18]  Hossein Mobahi,et al.  Homotopy Analysis for Tensor PCA , 2016, COLT.

[19]  Joel A. Tropp,et al.  An Introduction to Matrix Concentration Inequalities , 2015, Found. Trends Mach. Learn..

[20]  John Wright,et al.  When Are Nonconvex Problems Not Scary? , 2015, ArXiv.

[21]  J. Borwein,et al.  Convex Analysis And Nonlinear Optimization , 2000 .

[22]  Sayan Mukherjee,et al.  Learning theory: stability is sufficient for generalization and necessary and sufficient for consistency of empirical risk minimization , 2006, Adv. Comput. Math..

[23]  André Elisseeff,et al.  Stability and Generalization , 2002, J. Mach. Learn. Res..

[24]  Anima Anandkumar,et al.  Tensor decompositions for learning latent variable models , 2012, J. Mach. Learn. Res..