Parameter Selection and Preconditioning for a Graph Form Solver

In the chapter “Block splitting for distributed optimization”, Parikh and Boyd describe a method for solving a convex optimization problem, where each iteration involves evaluating a proximal operator and projection onto a subspace. In this chapter, we address the critical practical issues of how to select the proximal parameter in each iteration, and how to scale the original problem variables, so as to achieve reliable practical performance. The resulting method has been implemented as an open-source software package called POGS (Proximal Graph Solver), that targets multi-core and GPU-based systems, and has been tested on a wide variety of practical problems. Numerical results show that POGS can solve very large problems (with, say, a billion coefficients in the data), to modest accuracy in a few tens of seconds, where similar problems take many hours using interior-point methods.

[1]  Patrick L. Combettes,et al.  Proximal Algorithms for Multicomponent Image Recovery Problems , 2011, Journal of Mathematical Imaging and Vision.

[2]  Stephen J. Wright,et al.  Numerical Optimization , 2018, Fundamental Statistical Inference.

[3]  Michael A. Saunders,et al.  Atomic Decomposition by Basis Pursuit , 1998, SIAM J. Sci. Comput..

[4]  Tao Wang,et al.  Deep learning with COTS HPC systems , 2013, ICML.

[5]  Stephen P. Boyd,et al.  Preconditioning in fast dual gradient methods , 2014, 53rd IEEE Conference on Decision and Control.

[6]  Euhanna Ghadimi,et al.  Optimal Parameter Selection for the Alternating Direction Method of Multipliers (ADMM): Quadratic Problems , 2013, IEEE Transactions on Automatic Control.

[7]  Bo Wahlberg,et al.  An ADMM algorithm for solving ℓ1 regularized MPC , 2012, 2012 IEEE 51st IEEE Conference on Decision and Control (CDC).

[8]  Donghui Chen,et al.  Nonnegativity constraints in numerical analysis , 2009, The Birth of Numerical Analysis.

[9]  Corinna Cortes,et al.  Support-Vector Networks , 1995, Machine Learning.

[10]  Luca Maria Gambardella,et al.  Proceedings of the Twenty-Second International Joint Conference on Artificial Intelligence Flexible, High Performance Convolutional Neural Networks for Image Classification , 2022 .

[11]  Dimitri P. Bertsekas,et al.  On the Douglas—Rachford splitting method and the proximal point algorithm for maximal monotone operators , 1992, Math. Program..

[12]  Paul H. Calamai,et al.  Projected gradient methods for linearly constrained problems , 1987, Math. Program..

[13]  Stephen P. Boyd,et al.  Plenary talk: Performance bounds and suboptimal policies for multi-period investment , 2013, 22nd Mediterranean Conference on Control and Automation.

[14]  Robert J. Vanderbei,et al.  Symmetric Quasidefinite Matrices , 1995, SIAM J. Optim..

[15]  YANQING CHEN,et al.  Algorithm 8 xx : CHOLMOD , supernodal sparse Cholesky factorization and update / downdate ∗ , 2006 .

[16]  Kim-Chuan Toh,et al.  SDPT3 -- A Matlab Software Package for Semidefinite Programming , 1996 .

[17]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[18]  N. Shor Nondifferentiable Optimization and Polynomial Problems , 1998 .

[19]  M. Hestenes,et al.  Methods of conjugate gradients for solving linear systems , 1952 .

[20]  Michael A. Saunders,et al.  LSQR: An Algorithm for Sparse Linear Equations and Sparse Least Squares , 1982, TOMS.

[21]  R. Tibshirani Regression Shrinkage and Selection via the Lasso , 1996 .

[22]  D. Ruiz A Scaling Algorithm to Equilibrate Both Rows and Columns Norms in Matrices 1 , 2001 .

[23]  Antonin Chambolle,et al.  A First-Order Primal-Dual Algorithm for Convex Problems with Applications to Imaging , 2011, Journal of Mathematical Imaging and Vision.

[24]  Stephen J. Wright,et al.  Efficient schemes for robust IMRT treatment planning , 2006, Physics in medicine and biology.

[25]  Michael I. Jordan,et al.  A General Analysis of the Convergence of ADMM , 2015, ICML.

[26]  Quoc V. Le,et al.  On optimization methods for deep learning , 2011, ICML.

[27]  J. Pesquet,et al.  A Parallel Inertial Proximal Optimization Method , 2012 .

[28]  Benar Fux Svaiter,et al.  A family of projective splitting methods for the sum of two maximal monotone operators , 2007, Math. Program..

[29]  Arkadi Nemirovski,et al.  Lectures on modern convex optimization - analysis, algorithms, and engineering applications , 2001, MPS-SIAM series on optimization.

[30]  Stephen P. Boyd,et al.  Linear Matrix Inequalities in Systems and Control Theory , 1994 .

[31]  P. J. Huber Robust Estimation of a Location Parameter , 1964 .

[32]  O. Nelles,et al.  An Introduction to Optimization , 1996, IEEE Antennas and Propagation Magazine.

[33]  Richard Sinkhorn,et al.  Concerning nonnegative matrices and doubly stochastic matrices , 1967 .

[34]  P. Lions,et al.  Splitting Algorithms for the Sum of Two Nonlinear Operators , 1979 .

[35]  Richard G. Baraniuk,et al.  Fast Alternating Direction Optimization Methods , 2014, SIAM J. Imaging Sci..

[36]  Stephen P. Boyd,et al.  A Splitting Method for Optimal Control , 2013, IEEE Transactions on Control Systems Technology.

[37]  Stephen P. Boyd,et al.  Metric Selection in Douglas-Rachford Splitting and ADMM , 2014 .

[38]  E. Yaz Linear Matrix Inequalities In System And Control Theory , 1998, Proceedings of the IEEE.

[39]  Eric R. Ziegel,et al.  The Elements of Statistical Learning , 2003, Technometrics.

[40]  Ulf Isacsson,et al.  The IMRT information process—mastering the degrees of freedom in external beam therapy , 2006, Physics in medicine and biology.

[41]  Robert A. Korajczyk,et al.  The Arbitrage Pricing Theory and Multifactor Models of Asset Returns , 1993, Finance.

[42]  Stephen P. Boyd,et al.  Diagonal scaling in Douglas-Rachford splitting and ADMM , 2014, 53rd IEEE Conference on Decision and Control.

[43]  Stephen P. Boyd,et al.  Block splitting for distributed optimization , 2013, Mathematical Programming Computation.

[44]  Stephen P. Boyd,et al.  A Primal-Dual Operator Splitting Method for Conic Optimization , 2013 .

[45]  Patrick L. Combettes,et al.  Signal Recovery by Proximal Forward-Backward Splitting , 2005, Multiscale Model. Simul..

[46]  Stephen P. Boyd,et al.  Proximal Algorithms , 2013, Found. Trends Optim..

[47]  Jonathan E. Spingarn,et al.  Applications of the method of partial inverses to convex programming: Decomposition , 1985, Math. Program..

[48]  Antonin Chambolle,et al.  Diagonal preconditioning for first order primal-dual algorithms in convex optimization , 2011, 2011 International Conference on Computer Vision.

[49]  Arindam Banerjee,et al.  Bregman Alternating Direction Method of Multipliers , 2013, NIPS.

[50]  A. Bradley Algorithms for the Equilibration of Matrices and Their Application to Limited-Memory Quasi-Newton Methods , 2010 .

[51]  Pontus Giselsson,et al.  Tight linear convergence rate bounds for Douglas-Rachford splitting and ADMM , 2015, 2015 54th IEEE Conference on Decision and Control (CDC).

[52]  R. Glowinski,et al.  Sur l'approximation, par éléments finis d'ordre un, et la résolution, par pénalisation-dualité d'une classe de problèmes de Dirichlet non linéaires , 1975 .

[53]  M. Saunders,et al.  Solution of Sparse Indefinite Systems of Linear Equations , 1975 .

[54]  Lieven Vandenberghe,et al.  Primal-Dual Decomposition by Operator Splitting and Applications to Image Deblurring , 2014, SIAM J. Imaging Sci..

[55]  Patrick L. Combettes,et al.  A Monotone+Skew Splitting Model for Composite Monotone Inclusions in Duality , 2010, SIAM J. Optim..

[56]  Patrick L. Combettes,et al.  Proximal Splitting Methods in Signal Processing , 2009, Fixed-Point Algorithms for Inverse Problems in Science and Engineering.

[57]  Stephen P. Boyd,et al.  Distributed Optimization and Statistical Learning via the Alternating Direction Method of Multipliers , 2011, Found. Trends Mach. Learn..

[58]  Frederick R. Forst,et al.  On robust estimation of the location parameter , 1980 .

[59]  Damek Davis,et al.  Convergence Rate Analysis of Several Splitting Schemes , 2014, 1406.4834.

[60]  H. H. Rachford,et al.  On the numerical solution of heat conduction problems in two and three space variables , 1956 .

[61]  B. He,et al.  Alternating Direction Method with Self-Adaptive Penalty Parameters for Monotone Variational Inequalities , 2000 .

[62]  Stephen P. Boyd,et al.  Convex Optimization , 2004, Algorithms and Theory of Computation Handbook.