Implementing the ADMM to Big Datasets : A Case Study of LASSO

The alternating direction method of multipliers (ADMM) has been popularly used for a wide range of applications in the literature. When big datasets with high-dimensional variables are considered, subproblems arising from the ADMM must be solved inexactly even though theoretically they may have closed-form solutions. Such a scenario immediately poses mathematical ambiguities such as how accurately these subproblems should be solved and whether or not the convergence can be still guaranteed. Despite of the popularity of ADMM, it seems not too much is known in these regards. In this paper, we look into the mathematical detail of implementing the ADMM to such big-data scenarios. More specifically, we focus on the convex programming case where there is a quadratic function component with extremely high-dimensional variables in the objective of the model under discussion and thus there is a huge-scale system of linear equations to be solved at each iteration of the ADMM. We show that there is no need (indeed it is impossible) to solve this linear system exactly or too accurately; and propose an automatically adjustable inexactness criterion to solve these linear systems inexactly. We further identify the safe-guard numbers for the internally nested iterations that can sufficiently ensure this inexactness criterion if these linear systems are solved by standard numerical linear algebra solvers. The convergence, together with worst-case convergence rate measured by the iteration complexity, is rigorously established for the ADMM with inexactly-solved subproblems. Some numerical experiments for big datasets of the LASSO with millions of variables are reported to show the efficiency of this inexact implementation of ADMM.

[1]  Miguel Heredia Conde Fundamentals of Compressive Sensing , 2017 .

[2]  Chih-Jen Lin,et al.  Field-aware Factorization Machines for CTR Prediction , 2016, RecSys.

[3]  Bingsheng He,et al.  On non-ergodic convergence rate of Douglas–Rachford alternating direction method of multipliers , 2014, Numerische Mathematik.

[4]  Xiangfeng Wang,et al.  Multi-Agent Distributed Optimization via Inexact Consensus ADMM , 2014, IEEE Transactions on Signal Processing.

[5]  Jonathan Eckstein,et al.  Understanding the Convergence of the Alternating Direction Method of Multipliers: Theoretical and Computational Perspectives , 2015 .

[6]  Roland Glowinski,et al.  On Alternating Direction Methods of Multipliers: A Historical Perspective , 2014, Modeling, Simulation and Optimization for Science and Technology.

[7]  Renato D. C. Monteiro,et al.  Iteration-Complexity of Block-Decomposition Algorithms and the Alternating Direction Method of Multipliers , 2013, SIAM J. Optim..

[8]  Bingsheng He,et al.  On the O(1/n) Convergence Rate of the Douglas-Rachford Alternating Direction Method , 2012, SIAM J. Numer. Anal..

[9]  Ali H. Sayed,et al.  Diffusion Adaptation Strategies for Distributed Optimization and Learning Over Networks , 2011, IEEE Transactions on Signal Processing.

[10]  Julien Mairal,et al.  Optimization with Sparsity-Inducing Penalties , 2011, Found. Trends Mach. Learn..

[11]  Sonia Martínez,et al.  On Distributed Convex Optimization Under Inequality and Equality Constraints , 2010, IEEE Transactions on Automatic Control.

[12]  John Langford,et al.  Scaling up machine learning: parallel and distributed approaches , 2011, KDD '11 Tutorials.

[13]  Michael K. Ng,et al.  Inexact Alternating Direction Methods for Image Recovery , 2011, SIAM J. Sci. Comput..

[14]  Stephen P. Boyd,et al.  Distributed Optimization and Statistical Learning via the Alternating Direction Method of Multipliers , 2011, Found. Trends Mach. Learn..

[15]  Philipp Birken,et al.  Numerical Linear Algebra , 2011, Encyclopedia of Parallel Computing.

[16]  Asuman E. Ozdaglar,et al.  Constrained Consensus and Optimization in Multi-Agent Networks , 2008, IEEE Transactions on Automatic Control.

[17]  Pablo A. Parrilo,et al.  Guaranteed Minimum-Rank Solutions of Linear Matrix Equations via Nuclear Norm Minimization , 2007, SIAM Rev..

[18]  Lawrence K. Saul,et al.  Identifying suspicious URLs: an application of large-scale online learning , 2009, ICML '09.

[19]  Tom Goldstein,et al.  The Split Bregman Method for L1-Regularized Problems , 2009, SIAM J. Imaging Sci..

[20]  D. Donoho,et al.  Sparse MRI: The application of compressed sensing for rapid MR imaging , 2007, Magnetic resonance in medicine.

[21]  R. Baraniuk,et al.  Compressive Radar Imaging , 2007, 2007 IEEE Radar Conference.

[22]  M. Yuan,et al.  Model selection and estimation in regression with grouped variables , 2006 .

[23]  S. Sathiya Keerthi,et al.  A Modified Finite Newton Method for Fast Solution of Large Scale Linear SVMs , 2005, J. Mach. Learn. Res..

[24]  Gabriele Steidl,et al.  Combined SVM-Based Feature Selection and Classification , 2005, Machine Learning.

[25]  Yiming Yang,et al.  RCV1: A New Benchmark Collection for Text Categorization Research , 2004, J. Mach. Learn. Res..

[26]  Eric R. Ziegel,et al.  The Elements of Statistical Learning , 2003, Technometrics.

[27]  F. Facchinei,et al.  Finite-Dimensional Variational Inequalities and Complementarity Problems , 2003 .

[28]  Bingsheng He,et al.  A new inexact alternating directions method for monotone variational inequalities , 2002, Math. Program..

[29]  Gregory R. Andrews,et al.  Foundations of Multithreaded, Parallel, and Distributed Programming , 1999 .

[30]  Bingsheng He,et al.  Some convergence properties of a method of multipliers for linearly constrained monotone variational inequalities , 1998, Oper. Res. Lett..

[31]  R. Tyrrell Rockafellar,et al.  Variational Analysis , 1998, Grundlehren der mathematischen Wissenschaften.

[32]  R. Tibshirani Regression Shrinkage and Selection via the Lasso , 1996 .

[33]  Dimitri P. Bertsekas,et al.  On the Douglas—Rachford splitting method and the proximal point algorithm for maximal monotone operators , 1992, Math. Program..

[34]  R. Glowinski,et al.  Augmented Lagrangian and Operator-Splitting Methods in Nonlinear Mechanics , 1987 .

[35]  R. Glowinski,et al.  Chapter 1 Augmented Lagrangian Methods in Quadratic Programming , 1983 .

[36]  D. Gabay Applications of the method of multipliers to variational inequalities , 1983 .

[37]  Michael A. Saunders,et al.  LSQR: An Algorithm for Sparse Linear Equations and Sparse Least Squares , 1982, TOMS.

[38]  R. Glowinski,et al.  Finite element approximation and iterative solution of a class of mildly non-linear elliptic equations , 1978 .

[39]  B. Mercier,et al.  A dual algorithm for the solution of nonlinear variational problems via finite element approximation , 1976 .

[40]  J. Gillis,et al.  Matrix Iterative Analysis , 1961 .

[41]  R. Mises,et al.  Praktische Verfahren der Gleichungsauflösung . , 1929 .