Differential Inclusions for Modeling Nonsmooth ADMM Variants: A Continuous Limit Theory

Recently, there has been a great deal of research attention on understanding the convergence behavior of first-order methods. One line of this research focuses on analyzing the convergence behavior of first-order methods using tools from continuous dynamical systems such as ordinary differential equation and different inclusions. These research results shed lights on better understanding first-order methods from a non-optimization point of view. The alternating direction method of multipliers (ADMM) is a widely used firstorder method for solving optimization problems arising from machine learning and statistics, and it is important to investigate its behavior using these new techniques from dynamical systems. Existing works along this line have been mainly focusing on problems with smooth objective functions, which excludes many important applications that are traditionally solved by ADMM variants. In this paper, we analyze some well-known and widely used ADMM variants for nonsmooth optimization problems using the tools of differential inclusions. In particular, we analyze the convergence behavior of linearized ADMM and gradient-based ADMM for nonsmooth problems and show their connections with dynamical systems. We anticipate that these results will provide new insights on understanding ADMM for solving nonsmooth problems.

[1]  Xavier Bresson,et al.  Bregmanized Nonlocal Regularization for Deconvolution and Sparse Reconstruction , 2010, SIAM J. Imaging Sci..

[2]  Yunmei Chen,et al.  An Accelerated Linearized Alternating Direction Method of Multipliers , 2014, SIAM J. Imaging Sci..

[3]  Yurii Nesterov,et al.  Introductory Lectures on Convex Optimization - A Basic Course , 2014, Applied Optimization.

[4]  B. Mercier,et al.  A dual algorithm for the solution of nonlinear variational problems via finite element approximation , 1976 .

[5]  Qionghai Dai,et al.  A PID Controller Approach for Stochastic Optimization of Deep Networks , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[6]  Shiqian Ma,et al.  An Extragradient-Based Alternating Direction Method for Convex Minimization , 2017, Found. Comput. Math..

[7]  M. Fortin,et al.  Augmented Lagrangian methods : applications to the numerical solution of boundary-value problems , 1983 .

[8]  Bang Công Vu,et al.  A splitting algorithm for dual monotone inclusions involving cocoercive operators , 2011, Advances in Computational Mathematics.

[9]  J.-C. Pesquet,et al.  A Douglas–Rachford Splitting Approach to Nonsmooth Convex Variational Signal Recovery , 2007, IEEE Journal of Selected Topics in Signal Processing.

[10]  Junfeng Yang,et al.  Linearized augmented Lagrangian and alternating direction methods for nuclear norm minimization , 2012, Math. Comput..

[11]  Michael I. Jordan,et al.  A Lyapunov Analysis of Momentum Methods in Optimization , 2016, ArXiv.

[12]  Morteza Mardani,et al.  Neural Proximal Gradient Descent for Compressive Imaging , 2018, NeurIPS.

[13]  Marc Teboulle,et al.  A proximal-based decomposition method for convex minimization problems , 1994, Math. Program..

[14]  R. Tibshirani Regression Shrinkage and Selection via the Lasso , 1996 .

[15]  Shiqian Ma,et al.  Alternating Proximal Gradient Method for Convex Minimization , 2015, Journal of Scientific Computing.

[16]  Zhixun Su,et al.  Linearized Alternating Direction Method with Adaptive Penalty for Low-Rank Representation , 2011, NIPS.

[17]  R. Glowinski,et al.  Augmented Lagrangian and Operator-Splitting Methods in Nonlinear Mechanics , 1987 .

[18]  Yulei Luo,et al.  Mathematical Economics , 2019, Springer Texts in Business and Economics.

[19]  Marc Teboulle,et al.  Entropic Proximal Mappings with Applications to Nonlinear Programming , 1992, Math. Oper. Res..

[20]  Yangyang Xu,et al.  Alternating proximal gradient method for sparse nonnegative Tucker decomposition , 2013, Mathematical Programming Computation.

[21]  L. Rudin,et al.  Nonlinear total variation based noise removal algorithms , 1992 .

[22]  H. H. Rachford,et al.  On the numerical solution of heat conduction problems in two and three space variables , 1956 .

[23]  Alexandre M. Bayen,et al.  Accelerated Mirror Descent in Continuous and Discrete Time , 2015, NIPS.

[24]  Bingsheng He,et al.  A new inexact alternating directions method for monotone variational inequalities , 2002, Math. Program..

[25]  Y. Nesterov A method for unconstrained convex minimization problem with the rate of convergence o(1/k^2) , 1983 .

[26]  Stephen P. Boyd,et al.  An Interior-Point Method for Large-Scale l1-Regularized Logistic Regression , 2007, J. Mach. Learn. Res..

[27]  Junfeng Yang,et al.  Alternating Direction Algorithms for 1-Problems in Compressive Sensing , 2009, SIAM J. Sci. Comput..

[28]  Stephen P. Boyd,et al.  Distributed Optimization and Statistical Learning via the Alternating Direction Method of Multipliers , 2011, Found. Trends Mach. Learn..

[29]  Laetitia Paoli AN EXISTENCE RESULT FOR VIBRATIONS WITH UNILATERAL CONSTRAINTS , 2000 .

[30]  Jieping Ye,et al.  Large-scale sparse logistic regression , 2009, KDD.

[31]  Jean-François Aujol,et al.  The Differential Inclusion Modeling FISTA Algorithm and Optimality of Convergence Rate in the Case b $\leq3$ , 2018, SIAM J. Optim..

[32]  Andre Wibisono,et al.  A variational perspective on accelerated methods in optimization , 2016, Proceedings of the National Academy of Sciences.

[33]  H. H. Rachford,et al.  The Numerical Solution of Parabolic and Elliptic Differential Equations , 1955 .

[34]  H. Attouch,et al.  Rate of convergence of the Nesterov accelerated gradient method in the subcritical case α ≤ 3 , 2017, ESAIM: Control, Optimisation and Calculus of Variations.

[35]  Junfeng Yang,et al.  A New Alternating Minimization Algorithm for Total Variation Image Reconstruction , 2008, SIAM J. Imaging Sci..

[36]  Stephen P. Boyd,et al.  A Differential Equation for Modeling Nesterov's Accelerated Gradient Method: Theory and Insights , 2014, J. Mach. Learn. Res..

[37]  Tom Goldstein,et al.  The Split Bregman Method for L1-Regularized Problems , 2009, SIAM J. Imaging Sci..

[38]  Stephen P. Boyd,et al.  Mirror descent in non-convex stochastic programming , 2017, ArXiv.

[39]  Juan Peypouquet,et al.  Fast convergence of inertial dynamics and algorithms with asymptotic vanishing viscosity , 2018, Math. Program..

[40]  Damek Davis,et al.  A Three-Operator Splitting Scheme and its Optimization Applications , 2015, 1504.01032.

[41]  Jonathan Eckstein Some Saddle-function splitting methods for convex programming , 1994 .

[42]  R. Glowinski,et al.  Sur l'approximation, par éléments finis d'ordre un, et la résolution, par pénalisation-dualité d'une classe de problèmes de Dirichlet non linéaires , 1975 .

[43]  Samir Adly,et al.  Finite Time Stabilization of Nonlinear Oscillators Subject to dry Friction , 2006 .

[44]  Daniel P. Robinson,et al.  ADMM and Accelerated ADMM as Continuous Dynamical Systems , 2018, ICML.

[45]  S. Afriat Theory of Maxima and the Method of Lagrange , 1971 .

[46]  Daniel P. Robinson,et al.  Relax, and Accelerate: A Continuous Perspective on ADMM , 2018 .

[47]  Michael I. Jordan,et al.  Understanding the acceleration phenomenon via high-resolution differential equations , 2018, Mathematical Programming.

[48]  D. Gabay Applications of the method of multipliers to variational inequalities , 1983 .

[49]  Richard G. Baraniuk,et al.  Fast Alternating Direction Optimization Methods , 2014, SIAM J. Imaging Sci..

[50]  Dimitri P. Bertsekas,et al.  On the Douglas—Rachford splitting method and the proximal point algorithm for maximal monotone operators , 1992, Math. Program..

[51]  Laurent Condat,et al.  A Primal–Dual Splitting Method for Convex Optimization Involving Lipschitzian, Proximable and Linear Composite Terms , 2012, Journal of Optimization Theory and Applications.