Max-Affine Regression: Provable, Tractable, and Near-Optimal Statistical Estimation

Max-affine regression refers to a model where the unknown regression function is modeled as a maximum of $k$ unknown affine functions for a fixed $k \geq 1$. This generalizes linear regression and (real) phase retrieval, and is closely related to convex regression. Working within a non-asymptotic framework, we study this problem in the high-dimensional setting assuming that $k$ is a fixed constant, and focus on estimation of the unknown coefficients of the affine functions underlying the model. We analyze a natural alternating minimization (AM) algorithm for the non-convex least squares objective when the design is random. We show that the AM algorithm, when initialized suitably, converges with high probability and at a geometric rate to a small ball around the optimal coefficients. In order to initialize the algorithm, we propose and analyze a combination of a spectral method and a random search scheme in a low-dimensional space, which may be of independent interest. The final rate that we obtain is near-parametric and minimax optimal (up to a poly-logarithmic factor) as a function of the dimension, sample size, and noise variance. In that sense, our approach should be viewed as a direct and implementable method of enforcing regularization to alleviate the curse of dimensionality in problems of the convex regression type. As a by-product of our analysis, we also obtain guarantees on a classical algorithm for the phase retrieval problem under considerably weaker assumptions on the design distribution than was previously known. Numerical experiments illustrate the sharpness of our bounds in the various problem parameters.

[1]  M. Kanter,et al.  Reduction of variance for Gaussian densities via restriction to convex sets , 1977 .

[2]  Prateek Jain,et al.  Phase Retrieval Using Alternating Minimization , 2013, IEEE Transactions on Signal Processing.

[3]  Martin J. Wainwright,et al.  High-Dimensional Statistics , 2019 .

[4]  Christos Tzamos,et al.  Ten Steps of EM Suffice for Mixtures of Two Gaussians , 2016, COLT.

[5]  Yang Wang,et al.  Phase retrieval from very few measurements , 2013, ArXiv.

[6]  B. Sen,et al.  A Computational Framework for Multivariate Convex Regression and Its Variants , 2015, Journal of the American Statistical Association.

[7]  Manjunath B. G.,et al.  Moments Calculation for the Double Truncated Multivariate Normal Density , 2009, Journal of Behavioral Data Science.

[8]  Roman Vershynin,et al.  High-Dimensional Probability , 2018 .

[9]  Anima Anandkumar,et al.  Provable Tensor Methods for Learning Mixtures of Generalized Linear Models , 2014, AISTATS.

[10]  Arian Maleki,et al.  Global Analysis of Expectation Maximization for Mixtures of Two Gaussians , 2016, NIPS.

[11]  Martin J. Wainwright,et al.  Local Maxima in the Likelihood of Gaussian Mixture Models: Structural Results and Algorithmic Consequences , 2016, NIPS.

[12]  Alexandre B. Tsybakov,et al.  Introduction to Nonparametric Estimation , 2008, Springer series in statistics.

[13]  G. Paouris Small ball probability estimates for log-concave measures , 2012 .

[14]  P. Gänssler Weak Convergence and Empirical Processes - A. W. van der Vaart; J. A. Wellner. , 1997 .

[15]  M. Rudelson,et al.  The Littlewood-Offord problem and invertibility of random matrices , 2007, math/0703503.

[16]  Amit Daniely,et al.  Multiclass Learning Approaches: A Theoretical Comparison with Implications , 2012, NIPS.

[17]  Yuxin Chen,et al.  Solving Random Quadratic Systems of Equations Is Nearly as Easy as Solving Linear Systems , 2015, NIPS.

[18]  Xi Chen,et al.  Spectral Methods Meet EM: A Provably Optimal Algorithm for Crowdsourcing , 2014, J. Mach. Learn. Res..

[19]  S. Mendelson,et al.  Minimax rate of convergence and the performance of ERM in phase recovery , 2013, 1311.5024.

[20]  Sujay Sanghavi,et al.  Iterative Least Trimmed Squares for Mixed Linear Regression , 2019, NeurIPS.

[21]  Nabil H. Mustafa,et al.  Optimal Bounds on the VC-dimension , 2018, ArXiv.

[22]  G. M. Tallis The Moment Generating Function of the Truncated Multi‐Normal Distribution , 1961 .

[23]  E. Seijo,et al.  Nonparametric Least Squares Estimation of a Multivariate Convex Regression Function , 2010, 1003.4765.

[24]  Jens Gregor,et al.  Three‐dimensional support function estimation and application for projection magnetic resonance imaging , 2002, Int. J. Imaging Syst. Technol..

[25]  Jerry L. Prince,et al.  Reconstructing Convex Sets from Support Line Measurements , 1990, IEEE Trans. Pattern Anal. Mach. Intell..

[26]  R. Okafor Maximum likelihood estimation from incomplete data , 1987 .

[27]  Yonina C. Eldar,et al.  Phase Retrieval: Stability and Recovery Guarantees , 2012, ArXiv.

[28]  R. Gerchberg A practical algorithm for the determination of phase from image and diffraction plane pictures , 1972 .

[29]  Constantine Caramanis,et al.  Alternating Minimization for Mixed Linear Regression , 2013, ICML.

[30]  Xiaodong Li,et al.  Phase Retrieval via Wirtinger Flow: Theory and Algorithms , 2014, IEEE Transactions on Information Theory.

[31]  Yongcheng Qi,et al.  Limit distribution of the sum and maximum from multivariate Gaussian sequences , 2007 .

[32]  Gilles Hargé A convex/log-concave correlation inequality for Gaussian measure and an application to abstract Wiener spaces , 2004 .

[33]  David B. Dunson,et al.  Multivariate convex regression with adaptive partitioning , 2011, J. Mach. Learn. Res..

[34]  Robert W. Harrison,et al.  Phase problem in crystallography , 1993 .

[35]  Tengyao Wang,et al.  A useful variant of the Davis--Kahan theorem for statisticians , 2014, 1405.0680.

[36]  Yaozhong Hu,et al.  Itô-Wiener Chaos Expansion with Exact Residual and Correlation, Variance Inequalities , 1997 .

[37]  Yong Sheng Soh Fitting Convex Sets to Data: Algorithms and Applications , 2019 .

[38]  Gábor Balázs,et al.  Convex Regression: Theory, Practice, and Applications , 2016 .

[39]  Alfred O. Hero,et al.  On EM algorithms and their proximal generalizations , 2008, 1201.5912.

[40]  Martin J. Wainwright,et al.  Statistical guarantees for the EM algorithm: From population to sample-based analysis , 2014, ArXiv.

[41]  Joseph N. Wilson,et al.  Twenty Years of Mixture of Experts , 2012, IEEE Transactions on Neural Networks and Learning Systems.

[42]  Roman Vershynin,et al.  Introduction to the non-asymptotic analysis of random matrices , 2010, Compressed Sensing.

[43]  Stephen P. Boyd,et al.  Convex piecewise-linear fitting , 2009 .

[44]  Adityanand Guntuboyina,et al.  Covering Numbers for Convex Functions , 2012, IEEE Transactions on Information Theory.

[45]  Francis Bach,et al.  Slice inverse regression with score functions , 2018 .

[46]  Paul Tseng,et al.  An Analysis of the EM Algorithm and Entropy-Like Proximal Point Methods , 2004, Math. Oper. Res..

[47]  Feng Ruan,et al.  Solving (most) of a set of quadratic equalities: Composite optimization for robust phase retrieval , 2017, Information and Inference: A Journal of the IMA.

[48]  G. Papanicolaou,et al.  Array imaging using intensity-only measurements , 2010 .

[49]  J. Horowitz Semiparametric and Nonparametric Methods in Econometrics , 2007 .

[50]  Jamol Pender The truncated normal distribution: Applications to queues with impatient customers , 2015, Oper. Res. Lett..

[51]  Peter W. Glynn,et al.  Consistency of Multidimensional Convex Regression , 2012, Oper. Res..

[52]  Robert A. Jacobs,et al.  Hierarchical Mixtures of Experts and the EM Algorithm , 1993, Neural Computation.

[53]  Santosh S. Vempala,et al.  Learning Convex Concepts from Gaussian Distributions with PCA , 2010, 2010 IEEE 51st Annual Symposium on Foundations of Computer Science.

[54]  Michael I. Jordan,et al.  On Convergence Properties of the EM Algorithm for Gaussian Mixtures , 1996, Neural Computation.

[55]  Constantine Caramanis,et al.  Global Convergence of EM Algorithm for Mixtures of Two Component Linear Regression , 2018, ArXiv.

[56]  Ker-Chau Li,et al.  Sliced Inverse Regression for Dimension Reduction , 1991 .

[57]  J. Wellner,et al.  Entropy of Convex Functions on $$\mathbb {R}^d$$Rd , 2017 .

[58]  P. McMullen GEOMETRIC TOMOGRAPHY (Encyclopedia of Mathematics and its Applications 58) , 1997 .

[59]  M. Rudelson,et al.  Hanson-Wright inequality and sub-gaussian concentration , 2013 .

[60]  New York Dover,et al.  ON THE CONVERGENCE PROPERTIES OF THE EM ALGORITHM , 1983 .

[61]  Mehryar Mohri,et al.  Learning Theory and Algorithms for revenue optimization in second price auctions with reserve , 2013, ICML.

[62]  Sham M. Kakade,et al.  Learning mixtures of spherical gaussians: moment methods and spectral decompositions , 2012, ITCS '13.

[63]  J R Fienup,et al.  Phase retrieval algorithms: a comparison. , 1982, Applied optics.

[64]  Constantine Caramanis,et al.  Solving a Mixture of Many Random Linear Equations by Tensor Decomposition and Alternating Minimization , 2016, ArXiv.

[65]  Irène Waldspurger,et al.  Phase Retrieval With Random Gaussian Sensing Vectors by Alternating Projections , 2016, IEEE Transactions on Information Theory.

[66]  Inderjit S. Dhillon,et al.  Mixed Linear Regression with Multiple Components , 2016, NIPS.

[67]  Andrew F. Siegel,et al.  A Surprising Covariance Involving the Minimum Of Multivariate Normal Variables , 1993 .

[68]  Teng Zhang,et al.  Phase Retrieval by Alternating Minimization With Random Initialization , 2018, IEEE Transactions on Information Theory.

[69]  E. Beale,et al.  Missing Values in Multivariate Analysis , 1975 .

[70]  Percy Liang,et al.  Spectral Experts for Estimating Mixtures of Linear Regressions , 2013, ICML.

[71]  J. Wellner,et al.  ST ] 2 5 Ja n 20 16 MULTIVARIATE CONVEX REGRESSION : GLOBAL RISK BOUNDS AND ADAPTATION By Qiyang Han , 2016 .

[72]  Teng Zhang,et al.  Phase Retrieval Using Alternating Minimization in a Batch Setting , 2017, 2018 Information Theory and Applications Workshop (ITA).

[73]  Andrea Montanari,et al.  Matrix completion from a few entries , 2009, ISIT.

[74]  Jun S. Liu,et al.  Siegel ’ s formula via Stein ’ s identities , 2003 .

[75]  Jon A. Wellner,et al.  Weak Convergence and Empirical Processes: With Applications to Statistics , 1996 .

[76]  Ameet Talwalkar,et al.  Foundations of Machine Learning , 2012, Adaptive computation and machine learning.

[77]  Tim Roughgarden,et al.  Learning Simple Auctions , 2016, COLT.

[78]  Xiaodong Li,et al.  Optimal Rates of Convergence for Noisy Sparse Phase Retrieval via Thresholded Wirtinger Flow , 2015, ArXiv.

[79]  Inderjit S. Dhillon,et al.  Recovery Guarantees for One-hidden-layer Neural Networks , 2017, ICML.

[80]  Venkat Chandrasekaran,et al.  Fitting Tractable Convex Sets to Support Function Evaluations , 2019, Discret. Comput. Geom..

[81]  Yan Shuo Tan,et al.  Phase Retrieval via Randomized Kaczmarz: Theoretical Guarantees , 2017, ArXiv.

[82]  M. Rudelson,et al.  Small Ball Probabilities for Linear Images of High-Dimensional Distributions , 2014, 1402.4492.

[83]  M. Ledoux The concentration of measure phenomenon , 2001 .

[84]  Geoffrey E. Hinton,et al.  Adaptive Mixtures of Local Experts , 1991, Neural Computation.

[85]  A. Carbery,et al.  Distributional and L-q norm inequalities for polynomials over convex bodies in R-n , 2001 .

[86]  L. Wasserman All of Nonparametric Statistics , 2005 .

[87]  S. Geer Regression analysis and empirical processes , 1988 .