Fast methods for nonsmooth nonconvex minimization

We propose a new class of algorithms for nonsmooth, nonconvex problems that are expressible as compositions of nonsmooth nonconvex functions with smooth maps. In many cases, these algorithms require only (1) least squares solvers and (2) proximal operators of separable functions. An immediate consequence is that direct factorization and SVDs, as well as accelerated iterative methods such as preconditioned CG, fast-gradient methods, and (L)BFGS can be leveraged to solve nonsmooth nonconvex problems. We provide a convergence analysis and empirical results for a selected set of representative applications, including phase retrieval, stochastic shortest path, semi-supervised support vector machines (S$^3$VM), and exact robust PCA. In all of these applications, we see linear and superlinear empirical rates of convergence. In particular, we need fewer than 20 iterations to solve both exact RPCA and large-scale robust phase retrieval problems. As far as we know, the proposed algorithm is the first available for solving Kernel (S$^3$VM) problems.

[1]  Dmitriy Drusvyatskiy,et al.  Efficiency of minimizing compositions of convex functions and smooth maps , 2016, Math. Program..

[2]  Michael W. Farn New iterative algorithm for the design of phase-only gratings , 1991, Optics & Photonics.

[3]  Marc Teboulle,et al.  A Fast Iterative Shrinkage-Thresholding Algorithm for Linear Inverse Problems , 2009, SIAM J. Imaging Sci..

[4]  S. A. Werner,et al.  Imaging: Phase radiography with neutrons , 2000, Nature.

[5]  Ayhan Demiriz,et al.  Semi-Supervised Support Vector Machines , 1998, NIPS.

[6]  H. A. Ferwerda,et al.  The Problem of Phase Retrieval in Light and Electron Microscopy of Strong Objects: II. On the unique , 1976 .

[7]  Rick P. Millane,et al.  Phase retrieval in crystallography and optics , 1990 .

[8]  Aleksandr Y. Aravkin,et al.  A SMART Stochastic Algorithm for Nonconvex Optimization with Applications to Robust Machine Learning , 2016, ArXiv.

[9]  Ya-Xiang Yuan,et al.  On the superlinear convergence of a trust region algorithm for nonsmooth optimization , 1985, Math. Program..

[10]  R. Gerchberg A practical algorithm for the determination of phase from image and diffraction plane pictures , 1972 .

[11]  Xiaojin Zhu,et al.  --1 CONTENTS , 2006 .

[12]  Eric C. Chi,et al.  Splitting Methods for Convex Clustering , 2013, Journal of computational and graphical statistics : a joint publication of American Statistical Association, Institute of Mathematical Statistics, Interface Foundation of North America.

[13]  Francis R. Bach,et al.  Clusterpath: an Algorithm for Clustering using Convex Fusion Penalties , 2011, ICML.

[14]  Dmitriy Drusvyatskiy,et al.  Error Bounds, Quadratic Growth, and Linear Convergence of Proximal Methods , 2016, Math. Oper. Res..

[15]  Corinna Cortes,et al.  Support-Vector Networks , 1995, Machine Learning.

[16]  Nicolas Gillis,et al.  Introduction to Nonnegative Matrix Factorization , 2017, ArXiv.

[17]  J R Fienup,et al.  Phase retrieval algorithms: a comparison. , 1982, Applied optics.

[18]  M. J. D. Powell,et al.  On the global convergence of trust region algorithms for unconstrained minimization , 1984, Math. Program..

[19]  Eunho Yang,et al.  Robust Gaussian Graphical Modeling with the Trimmed Graphical Lasso , 2015, NIPS.

[20]  Alain Biem,et al.  Semisupervised Least Squares Support Vector Machine , 2009, IEEE Transactions on Neural Networks.

[21]  Adrian S. Lewis,et al.  A Robust Gradient Sampling Algorithm for Nonsmooth, Nonconvex Optimization , 2005, SIAM J. Optim..

[22]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[23]  Stephen J. Wright,et al.  Numerical Optimization (Springer Series in Operations Research and Financial Engineering) , 2000 .

[24]  John C. Duchi,et al.  Stochastic Methods for Composite Optimization Problems , 2017 .

[25]  S. Sathiya Keerthi,et al.  Optimization Techniques for Semi-Supervised Support Vector Machines , 2008, J. Mach. Learn. Res..

[26]  A. Szöke Holographic microscopy with a complicated reference : Holography II , 1997 .

[27]  R. Fletcher Practical Methods of Optimization , 1988 .

[28]  Dmitriy Drusvyatskiy,et al.  Stochastic subgradient method converges at the rate O(k-1/4) on weakly convex functions , 2018, ArXiv.

[29]  Damek Davis,et al.  The nonsmooth landscape of phase retrieval , 2017, IMA Journal of Numerical Analysis.

[30]  Jianqing Fan,et al.  Variable Selection via Nonconcave Penalized Likelihood and its Oracle Properties , 2001 .

[31]  Stephen J. Wright,et al.  A proximal method for composite minimization , 2008, Mathematical Programming.

[32]  Marc Teboulle,et al.  Proximal alternating linearized minimization for nonconvex and nonsmooth problems , 2013, Mathematical Programming.

[33]  A. Willsky,et al.  Sparse and low-rank matrix decompositions , 2009, 2009 47th Annual Allerton Conference on Communication, Control, and Computing (Allerton).

[34]  Alex Pentland,et al.  Face recognition using eigenfaces , 1991, Proceedings. 1991 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[35]  Stephen P. Boyd,et al.  Proximal Algorithms , 2013, Found. Trends Optim..

[36]  Christophe Croux,et al.  Sparse least trimmed squares regression for analyzing high-dimensional large data sets , 2013, 1304.4773.

[37]  Richard Bellman,et al.  ON A ROUTING PROBLEM , 1958 .

[38]  Wotao Yin,et al.  Global Convergence of ADMM in Nonconvex Nonsmooth Optimization , 2015, Journal of Scientific Computing.

[39]  Nicholas I. M. Gould,et al.  On the Evaluation Complexity of Composite Function Minimization with Applications to Nonconvex Nonlinear Programming , 2011, SIAM J. Optim..

[40]  J. Miao,et al.  Extending the methodology of X-ray crystallography to allow imaging of micrometre-sized non-crystalline specimens , 1999, Nature.

[41]  J. Kyparisis,et al.  Finite convergence of algorithms for nonlinear programs and variational inequalities , 1991 .

[42]  James V. Burke,et al.  Optical Wavefront Reconstruction: Theory and Numerical Methods , 2002, SIAM Rev..

[43]  Yurii Nesterov,et al.  Smooth minimization of non-smooth functions , 2005, Math. Program..

[44]  Jane J. Ye,et al.  Smoothing augmented Lagrangian method for nonsmooth constrained optimization problems , 2015, J. Glob. Optim..

[45]  Michael C. Ferris,et al.  A Gauss—Newton method for convex composite optimization , 1995, Math. Program..

[46]  Katya Scheinberg,et al.  Introduction to derivative-free optimization , 2010, Math. Comput..

[47]  Robert W. Harrison,et al.  Phase problem in crystallography , 1993 .

[48]  Michael A. Saunders,et al.  LSQR: An Algorithm for Sparse Linear Equations and Sparse Least Squares , 1982, TOMS.

[49]  P. Rousseeuw Multivariate estimation with high breakdown point , 1985 .

[50]  James R. Fienup,et al.  Iterative Method Applied To Image Reconstruction And To Computer-Generated Holograms , 1980 .

[51]  J. Frank,et al.  Three-dimensional cryoelectron microscopy of ribosomes. , 2000, Methods in enzymology.

[52]  L. Cromme Strong uniqueness , 1978 .

[53]  David M. Pennock,et al.  Co-Validation: Using Model Disagreement on Unlabeled Data to Validate Classification Algorithms , 2004, NIPS.

[54]  Thorsten Joachims,et al.  Transductive Inference for Text Classification using Support Vector Machines , 1999, ICML.

[55]  Yi Ma,et al.  Robust principal component analysis? , 2009, JACM.

[56]  M. Ferris,et al.  Weak sharp minima in mathematical programming , 1993 .

[57]  Massih-Reza Amini,et al.  Semi Supervised Logistic Regression , 2002, ECAI.

[58]  Feng Ruan,et al.  Solving (most) of a set of quadratic equalities: Composite optimization for robust phase retrieval , 2017, Information and Inference: A Journal of the IMA.