The estimation error of general first order methods

Modern large-scale statistical models require to estimate thousands to millions of parameters. This is often accomplished by iterative algorithms such as gradient descent, projected gradient descent or their accelerated versions. What are the fundamental limits to these approaches? This question is well understood from an optimization viewpoint when the underlying objective is convex. Work in this area characterizes the gap to global optimality as a function of the number of iterations. However, these results have only indirect implications in terms of the gap to statistical optimality. Here we consider two families of high-dimensional estimation problems: high-dimensional regression and low-rank matrix estimation, and introduce a class of `general first order methods' that aim at efficiently estimating the underlying parameters. This class of algorithms is broad enough to include classical first order optimization (for convex and non-convex objectives), but also other types of algorithms. Under a random design assumption, we derive lower bounds on the estimation error that hold in the high-dimensional asymptotics in which both the number of observations and the number of parameters diverge. These lower bounds are optimal in the sense that there exist algorithms whose estimation error matches the lower bounds up to asymptotically negligible terms. We illustrate our general results through applications to sparse phase retrieval and sparse principal component analysis.

[1]  Roman Vershynin,et al.  Introduction to the non-asymptotic analysis of random matrices , 2010, Compressed Sensing.

[2]  Nir Friedman,et al.  Probabilistic Graphical Models - Principles and Techniques , 2009 .

[3]  L. Evans Measure theory and fine properties of functions , 1992 .

[4]  Yurii Nesterov,et al.  Generalized Power Method for Sparse Principal Component Analysis , 2008, J. Mach. Learn. Res..

[5]  Yuxin Chen,et al.  Implicit Regularization in Nonconvex Statistical Estimation: Gradient Descent Converges Linearly for Phase Retrieval, Matrix Completion, and Blind Deconvolution , 2017, Found. Comput. Math..

[6]  Nicolas Macris,et al.  Optimal errors and phase transitions in high-dimensional generalized linear models , 2017, Proceedings of the National Academy of Sciences.

[7]  Andrea Montanari,et al.  State Evolution for Approximate Message Passing with Non-Separable Functions , 2017, Information and Inference: A Journal of the IMA.

[8]  Wasim Huleihel,et al.  Reducibility and Computational Lower Bounds for Problems with Planted Sparse Structure , 2018, COLT.

[9]  Andrea Montanari,et al.  The dynamics of message passing on dense graphs, with applications to compressed sensing , 2010, 2010 IEEE International Symposium on Information Theory.

[10]  Allan Sly,et al.  Reconstruction for the Potts model , 2009, STOC '09.

[11]  Santosh S. Vempala,et al.  Statistical Query Algorithms for Mean Vector Estimation and Stochastic Convex Optimization , 2015, SODA.

[12]  Zongming Ma Sparse Principal Component Analysis and Iterative Thresholding , 2011, 1112.2432.

[13]  R. Durrett Probability: Theory and Examples , 1993 .

[14]  Adel Javanmard,et al.  Debiasing the lasso: Optimal sample size for Gaussian designs , 2015, The Annals of Statistics.

[15]  Yuxin Chen,et al.  Solving Random Quadratic Systems of Equations Is Nearly as Easy as Solving Linear Systems , 2015, NIPS.

[16]  C. Villani Optimal Transport: Old and New , 2008 .

[17]  Andrea Montanari,et al.  Fundamental Limits of Weak Recovery with Applications to Phase Retrieval , 2017, COLT.

[18]  Po-Ling Loh,et al.  High-dimensional regression with noisy and missing data: Provable guarantees with non-convexity , 2011, NIPS.

[19]  Sujay Sanghavi,et al.  The Local Convexity of Solving Systems of Quadratic Equations , 2015, 1506.07868.

[20]  John Darzentas,et al.  Problem Complexity and Method Efficiency in Optimization , 1983 .

[21]  Terence Tao,et al.  The Dantzig selector: Statistical estimation when P is much larger than n , 2005, math/0506081.

[22]  Stephen E. Fienberg,et al.  Testing Statistical Hypotheses , 2005 .

[23]  C. Stein Estimation of the Mean of a Multivariate Normal Distribution , 1981 .

[24]  M. Mézard,et al.  Spin Glass Theory And Beyond: An Introduction To The Replica Method And Its Applications , 1986 .

[25]  Mahdi Soltanolkotabi,et al.  Structured Signal Recovery From Quadratic Measurements: Breaking Sample Complexity Barriers via Nonconvex Optimization , 2017, IEEE Transactions on Information Theory.

[26]  Andrea Montanari,et al.  Matrix Completion from Noisy Entries , 2009, J. Mach. Learn. Res..

[27]  Marc Lelarge,et al.  Fundamental limits of symmetric low-rank matrix estimation , 2016, Probability Theory and Related Fields.

[28]  Adel Javanmard,et al.  State Evolution for General Approximate Message Passing Algorithms, with Applications to Spatial Coupling , 2012, ArXiv.

[29]  M. Wainwright,et al.  High-dimensional analysis of semidefinite relaxations for sparse principal components , 2008, 2008 IEEE International Symposium on Information Theory.

[30]  Santosh S. Vempala,et al.  Statistical Algorithms and a Lower Bound for Detecting Planted Cliques , 2012, J. ACM.

[31]  Xiaodong Li,et al.  Sparse Signal Recovery from Quadratic Measurements via Convex Programming , 2012, SIAM J. Math. Anal..

[32]  S. Chatterjee A generalization of the Lindeberg principle , 2005, math/0508519.

[33]  Elchanan Mossel,et al.  Local Algorithms for Block Models with Side Information , 2015, ITCS.

[34]  Z. Bai,et al.  Limit of the smallest eigenvalue of a large dimensional sample covariance matrix , 1993 .

[35]  J. Norris Appendix: probability and measure , 1997 .

[36]  P. Rigollet,et al.  Optimal detection of sparse principal components in high dimension , 2012, 1202.5070.

[37]  Sundeep Rangan,et al.  Compressive Phase Retrieval via Generalized Approximate Message Passing , 2014, IEEE Transactions on Signal Processing.

[38]  Xiaodong Li,et al.  Optimal Rates of Convergence for Noisy Sparse Phase Retrieval via Thresholded Wirtinger Flow , 2015, ArXiv.

[39]  M. Mézard,et al.  Analytic and Algorithmic Solution of Random Satisfiability Problems , 2002, Science.

[40]  Yonina C. Eldar,et al.  Phase Retrieval: Stability and Recovery Guarantees , 2012, ArXiv.

[41]  P. Bickel,et al.  Optimal M-estimation in high-dimensional regression , 2013, Proceedings of the National Academy of Sciences.

[42]  Xiaodong Li,et al.  Phase Retrieval via Wirtinger Flow: Theory and Algorithms , 2014, IEEE Transactions on Information Theory.

[43]  E. Bolthausen An Iterative Construction of Solutions of the TAP Equations for the Sherrington–Kirkpatrick Model , 2012, 1201.2891.

[44]  Avi Wigderson,et al.  Sum-of-Squares Lower Bounds for Sparse PCA , 2015, NIPS.

[45]  Andrea Montanari,et al.  Sparse PCA via Covariance Thresholding , 2013, J. Mach. Learn. Res..

[46]  Stephen P. Boyd,et al.  Proximal Algorithms , 2013, Found. Trends Optim..

[47]  Lenka Zdeborová,et al.  On the glassy nature of the hard phase in inference problems , 2018, Physical Review X.

[48]  I. Johnstone,et al.  On Consistency and Sparsity for Principal Components Analysis in High Dimensions , 2009, Journal of the American Statistical Association.

[49]  P. Bickel,et al.  SIMULTANEOUS ANALYSIS OF LASSO AND DANTZIG SELECTOR , 2008, 0801.1095.