Extensions to the Proximal Distance Method of Constrained Optimization

The current paper studies the problem of minimizing a loss f(x) subject to constraints of the form Dx ∈ S, where S is a closed set, convex or not, and D is a matrix that fuses parameters. Fusion constraints can capture smoothness, sparsity, or more general constraint patterns. To tackle this generic class of problems, we combine the Beltrami-Courant penalty method of optimization with the proximal distance principle. The latter is driven by minimization of penalized objectives f(x)+ρ2dist(Dx,S)2 involving large tuning constants ρ and the squared Euclidean distance of Dx from S. The next iterate xn+1 of the corresponding proximal distance algorithm is constructed from the current iterate xn by minimizing the majorizing surrogate function f(x)+ρ2‖Dx−𝒫S(Dxn)‖2. For fixed ρ and a subanalytic loss f(x) and a subanalytic constraint set S, we prove convergence to a stationary point. Under stronger assumptions, we provide convergence rates and demonstrate linear local convergence. We also construct a steepest descent (SD) variant to avoid costly linear system solves. To benchmark our algorithms, we compare their results to those delivered by the alternating direction method of multipliers (ADMM). Our extensive numerical tests include problems on metric projection, convex regression, convex clustering, total variation image denoising, and projection of a matrix to a good condition number. These experiments demonstrate the superior speed and acceptable accuracy of our steepest variant on high-dimensional problems. Julia code to replicate all of our experiments can be found at https://github.com/alanderos91/ProximalDistanceAlgorithms.jl

[1]  Le Thi Hoai An,et al.  Convergence Analysis of Difference-of-Convex Algorithm with Subanalytic Data , 2018, Journal of Optimization Theory and Applications.

[2]  Jong-Shi Pang,et al.  Composite Difference-Max Programs for Modern Statistical Estimation Problems , 2018, SIAM J. Optim..

[3]  Jason Xu,et al.  Generalized Linear Model Regression under Distance-to-set Penalties , 2017, NIPS.

[4]  Amir Beck,et al.  First-Order Methods in Optimization , 2017 .

[5]  Mark W. Schmidt,et al.  Linear Convergence of Gradient and Proximal-Gradient Methods Under the Polyak-Łojasiewicz Condition , 2016, ECML/PKDD.

[6]  Kenneth Lange,et al.  MM optimization algorithms , 2016 .

[7]  Hua Zhou,et al.  Proximal Distance Algorithms: Theory and Practice , 2016, J. Mach. Learn. Res..

[8]  Nicholas J. Higham,et al.  Matrix Depot: an extensible test matrix collection for Julia , 2016, PeerJ Comput. Sci..

[9]  B. Sen,et al.  A Computational Framework for Multivariate Convex Regression and Its Variants , 2015, Journal of the American Statistical Association.

[10]  Zhihua Zhang,et al.  On the Global Convergence of Majorization Minimization Algorithms for Nonconvex Optimization Problems , 2015, ArXiv.

[11]  Alan Edelman,et al.  Julia: A Fresh Approach to Numerical Computing , 2014, SIAM Rev..

[12]  Stephen P. Boyd,et al.  Linear Convergence and Metric Selection for Douglas-Rachford Splitting and ADMM , 2014, IEEE Transactions on Automatic Control.

[13]  Stephen P. Boyd,et al.  Convex Optimization in Julia , 2014, 2014 First Workshop for High Performance Technical Computing in Dynamic Languages.

[14]  Zhi-Quan Luo,et al.  Convergence analysis of alternating direction method of multipliers for a family of nonconvex problems , 2014, 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[15]  Yurii Nesterov,et al.  Introductory Lectures on Convex Optimization - A Basic Course , 2014, Applied Optimization.

[16]  Stephen P. Boyd,et al.  Proximal Algorithms , 2013, Found. Trends Optim..

[17]  Eric C. Chi,et al.  Splitting Methods for Convex Clustering , 2013, Journal of computational and graphical statistics : a joint publication of American Statistical Association, Institute of Mathematical Statistics, Interface Foundation of North America.

[18]  Hua Zhou,et al.  Distance majorization and its applications , 2012, Mathematical Programming.

[19]  Émilie Chouzenoux,et al.  A Majorize–Minimize Strategy for Subspace Optimization Applied to Image Restoration , 2011, IEEE Transactions on Image Processing.

[20]  Stephen P. Boyd,et al.  Distributed Optimization and Statistical Learning via the Alternating Direction Method of Multipliers , 2011, Found. Trends Mach. Learn..

[21]  Heinz H. Bauschke,et al.  Convex Analysis and Monotone Operator Theory in Hilbert Spaces , 2011, CMS Books in Mathematics.

[22]  Michael A. Saunders,et al.  LSMR: An Iterative Algorithm for Sparse Least-Squares Problems , 2010, SIAM J. Sci. Comput..

[23]  E. Seijo,et al.  Nonparametric Least Squares Estimation of a Multivariate Convex Regression Function , 2010, 1003.4765.

[24]  James Bailey,et al.  Information Theoretic Measures for Clusterings Comparison: Variants, Properties, Normalization and Correction for Chance , 2010, J. Mach. Learn. Res..

[25]  Inderjit S. Dhillon,et al.  The Metric Nearness Problem , 2008, SIAM J. Matrix Anal. Appl..

[26]  Hédy Attouch,et al.  Proximal Alternating Minimization and Projection Methods for Nonconvex Problems: An Approach Based on the Kurdyka-Lojasiewicz Inequality , 2008, Math. Oper. Res..

[27]  Adrian S. Lewis,et al.  The [barred L]ojasiewicz Inequality for Nonsmooth Subanalytic Functions with Applications to Subgradient Dynamical Systems , 2006, SIAM J. Optim..

[28]  R. Tibshirani,et al.  Sparsity and smoothness via the fused lasso , 2005 .

[29]  Inderjit S. Dhillon,et al.  Triangle Fixing Algorithms for the Metric Nearness Problem , 2004, NIPS.

[30]  A. Kruger On Fréchet Subdifferentials , 2003 .

[31]  Eric R. Ziegel,et al.  Generalized Linear Models , 2002, Technometrics.

[32]  Philip I. Davies,et al.  Numerically Stable Generation of Correlation Matrices and Their Factors , 2000 .

[33]  L. Rudin,et al.  Nonlinear total variation based noise removal algorithms , 1992 .

[34]  Michael A. Saunders,et al.  LSQR: An Algorithm for Sparse Linear Equations and Sparse Least Squares , 1982, TOMS.

[35]  Edward J. Beltrami,et al.  An Algorithmic Approach to Nonlinear Analysis and Optimization , 1973 .

[36]  G. Debreu Definite and Semidefinite Quadratic Forms , 1952 .

[37]  Phillipp Kaestner,et al.  Linear And Nonlinear Programming , 2016 .

[38]  Laurent Condat Fast projection onto the simplex and the l1\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\pmb {l}_\mathbf {1}$$\end{ , 2015, Mathematical Programming.

[39]  V. Arnold,et al.  Real algebraic geometry , 2013 .

[40]  Robert H. Halstead,et al.  Matrix Computations , 2011, Encyclopedia of Parallel Computing.

[41]  J. Borwein,et al.  Convex Analysis And Nonlinear Optimization , 2000 .

[42]  Kenneth Lange,et al.  Numerical analysis for statisticians , 1999 .

[43]  M. R. Mickey,et al.  Population correlation matrices for sampling experiments , 1978 .

[44]  R. Courant Variational methods for the solution of problems of equilibrium and vibrations , 1943 .