Solving Equations of Random Convex Functions via Anchored Regression

We consider the question of estimating a solution to a system of equations that involve convex nonlinearities, a problem that is common in machine learning and signal processing. Because of these nonlinearities, conventional estimators based on empirical risk minimization generally involve solving a non-convex optimization program. We propose anchored regression, a new approach based on convex programming that amounts to maximizing a linear functional (perhaps augmented by a regularizer) over a convex set. The proposed convex program is formulated in the natural space of the problem, and avoids the introduction of auxiliary variables, making it computationally favorable. Working in the native space also provides great flexibility as structural priors (e.g., sparsity) can be seamlessly incorporated. For our analysis, we model the equations as being drawn from a fixed set according to a probability law. Our main results provide guarantees on the accuracy of the estimator in terms of the number of equations we are solving, the amount of noise present, a measure of statistical complexity of the random equations, and the geometry of the regularizer at the true solution. We also provide recipes for constructing the anchor vector (that determines the linear functional to maximize) directly from the observed data.

[1]  R. Paley,et al.  A note on analytic functions in the unit circle , 1932, Mathematical Proceedings of the Cambridge Philosophical Society.

[2]  Xiaodong Li,et al.  Solving Quadratic Equations via PhaseLift When There Are About as Many Equations as Unknowns , 2012, Found. Comput. Math..

[3]  Soumendu Sundar Mukherjee,et al.  Weak convergence and empirical processes , 2019 .

[4]  Shai Ben-David,et al.  Understanding Machine Learning: From Theory to Algorithms , 2014 .

[5]  Thomas Strohmer,et al.  Self-calibration and biconvex compressive sensing , 2015, ArXiv.

[6]  J. Romberg,et al.  A flexible convex relaxation for phase retrieval , 2017 .

[7]  Yonina C. Eldar,et al.  Simultaneously Structured Models With Application to Sparse and Low-Rank Matrices , 2012, IEEE Transactions on Information Theory.

[8]  M. Talagrand,et al.  Probability in Banach Spaces: Isoperimetry and Processes , 1991 .

[9]  Susan A. Murphy,et al.  Monographs on statistics and applied probability , 1990 .

[10]  R. Tibshirani,et al.  Sparse Principal Component Analysis , 2006 .

[11]  Joel A. Tropp,et al.  Convex recovery of a structured signal from independent random linear measurements , 2014, ArXiv.

[12]  Justin Romberg,et al.  Phase Retrieval Meets Statistical Learning Theory: A Flexible Convex Relaxation , 2016, AISTATS.

[13]  V. Koltchinskii,et al.  Bounding the smallest singular value of a random matrix without concentration , 2013, 1312.3580.

[14]  Wenbo V. Li,et al.  Gaussian integrals involving absolute value functions , 2009 .

[15]  S. Mendelson,et al.  Regularization and the small-ball method I: sparse recovery , 2016, 1601.05584.

[16]  Philippe Rigollet,et al.  Complexity Theoretic Lower Bounds for Sparse Principal Component Detection , 2013, COLT.

[17]  S. Mendelson Learning without concentration for general loss functions , 2014, 1410.3192.

[18]  Tom Goldstein,et al.  Convex Phase Retrieval without Lifting via PhaseMax , 2017, ICML.

[19]  Thomas Blumensath,et al.  Compressed Sensing With Nonlinear Observations and Related Nonlinear Optimization Problems , 2012, IEEE Transactions on Information Theory.

[20]  Tom Goldstein,et al.  PhaseMax: Convex Phase Retrieval via Basis Pursuit , 2016, IEEE Transactions on Information Theory.

[21]  Shahar Mendelson,et al.  Regularization and the small-ball method II: complexity dependent error rates , 2016, J. Mach. Learn. Res..

[22]  Yaniv Plan,et al.  The Generalized Lasso With Non-Linear Observations , 2015, IEEE Transactions on Information Theory.

[23]  Justin K. Romberg,et al.  Blind Deconvolution Using Convex Programming , 2012, IEEE Transactions on Information Theory.

[24]  Shahar Mendelson,et al.  Learning without Concentration , 2014, COLT.

[25]  Tengyao Wang,et al.  A useful variant of the Davis--Kahan theorem for statisticians , 2014, 1405.0680.

[26]  Bhiksha Raj,et al.  Greedy sparsity-constrained optimization , 2011, 2011 Conference Record of the Forty Fifth Asilomar Conference on Signals, Systems and Computers (ASILOMAR).

[27]  Vladislav Voroninski,et al.  An Elementary Proof of Convex Phase Retrieval in the Natural Parameter Space via the Linear Program PhaseMax , 2016, ArXiv.

[28]  Xiaodong Li,et al.  Phase Retrieval via Wirtinger Flow: Theory and Algorithms , 2014, IEEE Transactions on Information Theory.

[29]  Y. Plan,et al.  High-dimensional estimation with geometric constraints , 2014, 1404.3749.

[30]  Vladislav Voroninski,et al.  Compressed Sensing from Phaseless Gaussian Measurements via Linear Programming in the Natural Parameter Space , 2016, ArXiv.

[31]  H. Ichimura,et al.  SEMIPARAMETRIC LEAST SQUARES (SLS) AND WEIGHTED SLS ESTIMATION OF SINGLE-INDEX MODELS , 1993 .

[32]  Lutz Dümbgen,et al.  Nemirovski's Inequalities Revisited , 2008, Am. Math. Mon..

[33]  Christopher J. Hillar,et al.  Most Tensor Problems Are NP-Hard , 2009, JACM.

[34]  Vladislav Voroninski,et al.  Corruption Robust Phase Retrieval via Linear Programming , 2016, ArXiv.

[35]  Massimo Fornasier,et al.  Quasi-linear Compressed Sensing , 2013, Multiscale Model. Simul..

[36]  I. Johnstone,et al.  On Consistency and Sparsity for Principal Components Analysis in High Dimensions , 2009, Journal of the American Statistical Association.

[37]  Mahdi Soltanolkotabi,et al.  Learning ReLUs via Gradient Descent , 2017, NIPS.

[38]  Colin McDiarmid,et al.  Surveys in Combinatorics, 1989: On the method of bounded differences , 1989 .

[39]  Simon Haykin,et al.  Neural Networks and Learning Machines , 2010 .

[40]  Yonina C. Eldar,et al.  GESPAR: Efficient Phase Retrieval of Sparse Signals , 2013, IEEE Transactions on Signal Processing.

[41]  Emmanuel J. Candès,et al.  PhaseLift: Exact and Stable Signal Recovery from Magnitude Measurements via Convex Programming , 2011, ArXiv.

[42]  Peter Schlattmann,et al.  Theory and Algorithms , 2009 .

[43]  C. Stein A bound for the error in the normal approximation to the distribution of a sum of dependent random variables , 1972 .

[44]  Yonina C. Eldar,et al.  Sparsity Constrained Nonlinear Optimization: Optimality Conditions and Algorithms , 2012, SIAM J. Optim..