Statistical Analysis of Stationary Solutions of Coupled Nonconvex Nonsmooth Empirical Risk Minimization

This paper has two main goals: (a) establish several statistical properties---consistency, asymptotic distributions, and convergence rates---of stationary solutions and values of a class of coupled nonconvex and nonsmoothempirical risk minimization problems, and (b) validate these properties by a noisy amplitude-based phase retrieval problem, the latter being of much topical interest.Derived from available data via sampling, these empirical risk minimization problems are the computational workhorse of a population risk model which involves the minimization of an expected value of a random functional. When these minimization problems are nonconvex, the computation of their globally optimal solutions is elusive. Together with the fact that the expectation operator cannot be evaluated for general probability distributions, it becomes necessary to justify whether the stationary solutions of the empirical problems are practical approximations of the stationary solution of the population problem. When these two features, general distribution and nonconvexity, are coupled with nondifferentiability that often renders the problems "non-Clarke regular", the task of the justification becomes challenging. Our work aims to address such a challenge within an algorithm-free setting. The resulting analysis is therefore different from the much of the analysis in the recent literature that is based on local search algorithms. Furthermore, supplementing the classical minimizer-centric analysis, our results offer a first step to close the gap between computational optimization and asymptotic analysis of coupled nonconvex nonsmooth statistical estimation problems, expanding the former with statistical properties of the practically obtained solution and providing the latter with a more practical focus pertaining to computational tractability.

[1]  Gül Gürkan,et al.  Sample-path solution of stochastic variational inequalities , 1999, Math. Program..

[2]  Bastian Goldlücke,et al.  Variational Analysis , 2014, Computer Vision, A Reference Guide.

[3]  Xiaodong Li,et al.  Phase Retrieval via Wirtinger Flow: Theory and Algorithms , 2014, IEEE Transactions on Information Theory.

[4]  Arian Maleki,et al.  Optimization-Based AMP for Phase Retrieval: The Impact of Initialization and $\ell_{2}$ Regularization , 2018, IEEE Transactions on Information Theory.

[5]  Yuxin Chen,et al.  Gradient descent with random initialization: fast global convergence for nonconvex phase retrieval , 2018, Mathematical Programming.

[6]  丸山 徹 Convex Analysisの二,三の進展について , 1977 .

[7]  Adil M. Bagirov,et al.  An algorithm for the estimation of a regression function by continuous piecewise linear functions , 2010, Comput. Optim. Appl..

[8]  Jong-Shi Pang,et al.  Computing B-Stationary Points of Nonsmooth DC Programs , 2015, Math. Oper. Res..

[9]  F. Facchinei,et al.  Finite-Dimensional Variational Inequalities and Complementarity Problems , 2003 .

[10]  Michael I. Jordan,et al.  On the Local Minima of the Empirical Risk , 2018, NeurIPS.

[11]  L. Lecam On the Assumptions Used to Prove Asymptotic Normality of Maximum Likelihood Estimates , 1970 .

[12]  Yonina C. Eldar,et al.  Phase Retrieval with Application to Optical Imaging: A contemporary overview , 2015, IEEE Signal Processing Magazine.

[13]  S. R. Jammalamadaka,et al.  Empirical Processes in M-Estimation , 2001 .

[14]  R. Tyrrell Rockafellar,et al.  Asymptotic Theory for Solutions in Statistical Estimation and Stochastic Programming , 1993, Math. Oper. Res..

[15]  Rory A. Fisher,et al.  Theory of Statistical Estimation , 1925, Mathematical Proceedings of the Cambridge Philosophical Society.

[16]  H. Chernoff On the Distribution of the Likelihood Ratio , 1954 .

[17]  Mingyi Hong,et al.  Local Minimizers and Second-Order Conditions in Composite Piecewise Programming via Directional Derivatives , 2017 .

[18]  Alexander Shapiro,et al.  Stochastic Approximation approach to Stochastic Programming , 2013 .

[19]  David B. Dunson,et al.  Multivariate convex regression with adaptive partitioning , 2011, J. Mach. Learn. Res..

[20]  Huifu Xu,et al.  Smooth sample average approximation of stationary points in nonsmooth stochastic optimization and applications , 2009, Math. Program..

[21]  Zhe Sun,et al.  Enhanced proximal DC algorithms with extrapolation for a class of structured nonsmooth DC minimization , 2018, Mathematical Programming.

[22]  J. Dupacová,et al.  ASYMPTOTIC BEHAVIOR OF STATISTICAL ESTIMATORS AND OF OPTIMAL SOLUTIONS OF STOCHASTIC OPTIMIZATION PROBLEMS , 1988 .

[23]  I. Molchanov Theory of Random Sets , 2005 .

[24]  Jong-Shi Pang,et al.  Composite Difference-Max Programs for Modern Statistical Estimation Problems , 2018, SIAM J. Optim..

[25]  Po-Ling Loh,et al.  High-dimensional regression with noisy and missing data: Provable guarantees with non-convexity , 2011, NIPS.

[26]  Johannes O. Royset,et al.  Variational analysis of constrained M-estimators , 2017, The Annals of Statistics.

[27]  Boris Polyak,et al.  Acceleration of stochastic approximation by averaging , 1992 .

[28]  Geoffrey E. Hinton,et al.  Rectified Linear Units Improve Restricted Boltzmann Machines , 2010, ICML.

[29]  Feng Ruan,et al.  Solving (most) of a set of quadratic equalities: Composite optimization for robust phase retrieval , 2017, Information and Inference: A Journal of the IMA.

[30]  H. Attouch Variational convergence for functions and operators , 1984 .

[31]  A. Shapiro Asymptotic Properties of Statistical Estimators in Stochastic Programming , 1989 .

[32]  R. Fisher,et al.  On the Mathematical Foundations of Theoretical Statistics , 1922 .

[33]  Po-Ling Loh,et al.  Regularized M-estimators with nonconvexity: statistical and algorithmic theory for local optima , 2013, J. Mach. Learn. Res..

[34]  W. Bryc The Normal Distribution: Characterizations with Applications , 1995 .

[35]  A. Shapiro,et al.  Uniform laws of large numbers for set-valued mappings and subdifferentials of random functions , 2007 .

[36]  Peter L. Bartlett,et al.  Rademacher and Gaussian Complexities: Risk Bounds and Structural Results , 2003, J. Mach. Learn. Res..

[37]  A. Wald Note on the Consistency of the Maximum Likelihood Estimate , 1949 .

[38]  P. J. Huber The behavior of maximum likelihood estimates under nonstandard conditions , 1967 .

[39]  K. Liang,et al.  Asymptotic Properties of Maximum Likelihood Estimators and Likelihood Ratio Tests under Nonstandard Conditions , 1987 .

[40]  Alexander Shapiro,et al.  Lectures on Stochastic Programming: Modeling and Theory , 2009 .

[41]  C. Geyer On the Asymptotics of Constrained $M$-Estimation , 1994 .

[42]  A. Montanari,et al.  The landscape of empirical risk for nonconvex losses , 2016, The Annals of Statistics.

[43]  Damek Davis,et al.  The nonsmooth landscape of phase retrieval , 2017, IMA Journal of Numerical Analysis.

[44]  F. Clarke Optimization And Nonsmooth Analysis , 1983 .

[45]  A. Ruszczynski,et al.  Statistical estimation of composite risk functionals and risk optimization problems , 2015, 1504.02658.

[46]  Yoshua Bengio,et al.  Deep Sparse Rectifier Neural Networks , 2011, AISTATS.

[47]  Po-Ling Loh,et al.  Statistical consistency and asymptotic normality for high-dimensional robust M-estimators , 2015, ArXiv.

[48]  Mingyi Hong,et al.  A Study of Piecewise Linear-Quadratic Programs , 2017, Journal of Optimization Theory and Applications.

[49]  Huifu Xu,et al.  Sample Average Approximation Methods for a Class of Stochastic Variational inequality Problems , 2010, Asia Pac. J. Oper. Res..

[50]  T. Ferguson A Course in Large Sample Theory , 1996 .

[51]  Yonina C. Eldar,et al.  Solving Systems of Random Quadratic Equations via Truncated Amplitude Flow , 2016, IEEE Transactions on Information Theory.

[52]  H. Robbins A Stochastic Approximation Method , 1951 .

[53]  Johannes O. Royset,et al.  Approximations of semicontinuous functions with applications to stochastic optimization and statistical estimation , 2017, Mathematical Programming.

[54]  A. Shapiro Monte Carlo Sampling Methods , 2003 .

[55]  S. Scholtes Introduction to Piecewise Differentiable Equations , 2012 .