Distance majorization and its applications

The problem of minimizing a continuously differentiable convex function over an intersection of closed convex sets is ubiquitous in applied mathematics. It is particularly interesting when it is easy to project onto each separate set, but nontrivial to project onto their intersection. Algorithms based on Newton’s method such as the interior point method are viable for small to medium-scale problems. However, modern applications in statistics, engineering, and machine learning are posing problems with potentially tens of thousands of parameters or more. We revisit this convex programming problem and propose an algorithm that scales well with dimensionality. Our proposal is an instance of a sequential unconstrained minimization technique and revolves around three ideas: the majorization-minimization principle, the classical penalty method for constrained optimization, and quasi-Newton acceleration of fixed-point algorithms. The performance of our distance majorization algorithms is illustrated in several applications.

[1]  Heinz H. Bauschke,et al.  Fixed-Point Algorithms for Inverse Problems in Science and Engineering , 2011, Springer Optimization and Its Applications.

[2]  Gene H. Golub,et al.  Matrix computations (3rd ed.) , 1996 .

[3]  M. J. D. Powell,et al.  Nonlinear Programming—Sequential Unconstrained Minimization Techniques , 1969 .

[4]  Nicholas I. M. Gould,et al.  How good are projection methods for convex feasibility problems? , 2008, Comput. Optim. Appl..

[5]  Anthony Widjaja,et al.  Learning with Kernels: Support Vector Machines, Regularization, Optimization, and Beyond , 2003, IEEE Transactions on Neural Networks.

[6]  B. Lindsay,et al.  Monotonicity of quadratic-approximation algorithms , 1988 .

[7]  Panos M. Pardalos,et al.  Convex optimization theory , 2010, Optim. Methods Softw..

[8]  P. Sen,et al.  Constrained Statistical Inference: Inequality, Order, and Shape Restrictions , 2001 .

[9]  Leon Hirsch,et al.  Fundamentals Of Convex Analysis , 2016 .

[10]  H. D. Brunk,et al.  Statistical inference under order restrictions : the theory and application of isotonic regression , 1973 .

[11]  Robert R. Meyer,et al.  Sufficient Conditions for the Convergence of Monotonic Mathematical Programming Algorithms , 1976, J. Comput. Syst. Sci..

[12]  J. Borwein,et al.  Convex Analysis And Nonlinear Optimization , 2000 .

[13]  丸山 徹 Convex Analysisの二,三の進展について , 1977 .

[14]  Kuldeep Kumar,et al.  Robust Statistics, 2nd edn , 2011 .

[15]  D. Hunter,et al.  Optimization Transfer Using Surrogate Objective Functions , 2000 .

[16]  Inderjit S. Dhillon,et al.  Tackling Box-Constrained Optimization via a New Projected Quasi-Newton Approach , 2010, SIAM J. Sci. Comput..

[17]  James M. Ortega,et al.  Iterative solution of nonlinear equations in several variables , 2014, Computer science and applied mathematics.

[18]  Stephen J. Wright,et al.  Numerical Optimization (Springer Series in Operations Research and Financial Engineering) , 2000 .

[19]  C. Michelot A finite algorithm for finding the projection of a point onto the canonical simplex of ∝n , 1986 .

[20]  Y. Nesterov Gradient methods for minimizing composite objective function , 2007 .

[21]  B. Mordukhovich,et al.  Applications of variational analysis to a generalized Heron problem , 2011, 1106.0088.

[22]  Boris S. Mordukhovich,et al.  Applications of Variational Analysis to a Generalized Fermat-Torricelli Problem , 2011, J. Optim. Theory Appl..

[23]  Patrick L. Combettes,et al.  Signal Recovery by Proximal Forward-Backward Splitting , 2005, Multiscale Model. Simul..

[24]  Eugenia Stoimenova Constrained Statistical Inference: Inequality, Order and Shape Restrictions by M. J. Silvapulle and P. K. Sen , 2006 .

[25]  Charles L. Byrne,et al.  Alternating Minimization as Sequential Unconstrained Minimization: A Survey , 2012, Journal of Optimization Theory and Applications.

[26]  C. Byrne Sequential unconstrained minimization algorithms for constrained optimization , 2008 .

[27]  D. Bertsekas Projected Newton methods for optimization problems with simple constraints , 1981, 1981 20th IEEE Conference on Decision and Control including the Symposium on Adaptive Processes.

[28]  Charles L. Byrne,et al.  An Elementary Proof of Convergence for the Forward-Backward Splitting Algorithm , 2013 .

[29]  Marc Teboulle,et al.  A Fast Iterative Shrinkage-Thresholding Algorithm for Linear Inverse Problems , 2009, SIAM J. Imaging Sci..

[30]  J. B. Rosen The Gradient Projection Method for Nonlinear Programming. Part I. Linear Constraints , 1960 .

[31]  Xiao-Li Meng,et al.  [Optimization Transfer Using Surrogate Objective Functions]: Discussion , 2000 .

[32]  Eric C. Chi,et al.  A Look at the Generalized Heron Problem through the Lens of Majorization-Minimization , 2014 .

[33]  Vladimir N. Vapnik,et al.  The Nature of Statistical Learning Theory, Second Edition , 2000, Statistics for Engineering and Information Science.

[34]  J. B. Rosen The gradient projection method for nonlinear programming: Part II , 1961 .

[35]  Ambuj Tewari,et al.  Applications of strong convexity--strong smoothness duality to learning with matrices , 2009, ArXiv.

[36]  Patrick L. Combettes,et al.  On the effectiveness of projection methods for convex feasibility problems with linear inequality constraints , 2009, Computational Optimization and Applications.

[37]  Yair Censor,et al.  Proximity Function Minimization Using Multiple Bregman Projections, with Applications to Split Feasibility and Kullback–Leibler Distance Minimization , 2001, Ann. Oper. Res..

[38]  K. Lange,et al.  EM algorithms without missing data , 1997, Statistical methods in medical research.

[39]  Eric C. Chi,et al.  A Look at the Generalized Heron Problem through the Lens of Majorization-Minimization , 2014, Am. Math. Mon..

[40]  R. Dykstra An Algorithm for Restricted Least Squares Regression , 1983 .

[41]  Boris S. Mordukhovich,et al.  Solving a Generalized Heron Problem by Means of Convex Analysis , 2012, Am. Math. Mon..

[42]  Charles L. Byrne,et al.  Applied Iterative Methods , 2007 .

[43]  Ying Xiong Nonlinear Optimization , 2014 .

[44]  Vladimir Cherkassky,et al.  The Nature Of Statistical Learning Theory , 1997, IEEE Trans. Neural Networks.

[45]  F. T. Wright,et al.  Order restricted statistical inference , 1988 .

[46]  Yoram Singer,et al.  Efficient projections onto the l1-ball for learning in high dimensions , 2008, ICML '08.

[47]  Stephen P. Boyd,et al.  Convex Optimization , 2004, Algorithms and Theory of Computation Handbook.

[48]  Kenneth Lange,et al.  Numerical analysis for statisticians , 1999 .

[49]  Patrick L. Combettes,et al.  Proximal Splitting Methods in Signal Processing , 2009, Fixed-Point Algorithms for Inverse Problems in Science and Engineering.

[50]  Hua Zhou,et al.  A quasi-Newton acceleration for high-dimensional optimization algorithms , 2011, Stat. Comput..

[51]  A. Banerjee Convex Analysis and Optimization , 2006 .

[52]  K. Lange,et al.  The MM Alternative to EM , 2010, 1104.2203.

[53]  S. Kakade,et al.  On the duality of strong convexity and strong smoothness : Learning applications and matrix regularization , 2009 .

[54]  Mark W. Schmidt,et al.  Optimizing Costly Functions with Simple Constraints: A Limited-Memory Projected Quasi-Newton Algorithm , 2009, AISTATS.