ADMM Algorithmic Regularization Paths for Sparse Statistical Machine Learning

Optimization approaches based on operator splitting are becoming popular for solving sparsity regularized statistical machine learning models. While many have proposed fast algorithms to solve these problems for a single regularization parameter, conspicuously less attention has been given to computing regularization paths, or solving the optimization problems over the full range of regularization parameters to obtain a sequence of sparse models. In this chapter, we aim to quickly approximate the sequence of sparse models associated with regularization paths for the purposes of statistical model selection by using the building blocks from a classical operator splitting method, the Alternating Direction Method of Multipliers (ADMM). We begin by proposing an ADMM algorithm that uses warm-starts to quickly compute the regularization path. Then, by employing approximations along this warm-starting ADMM algorithm, we propose a novel concept that we term the ADMM Algorithmic Regularization Path. Our method can quickly outline the sequence of sparse models associated with the regularization path in computational time that is often less than that of using the ADMM algorithm to solve the problem for a single regularization parameter. We demonstrate the applicability and substantial computational savings of our approach through three popular examples, sparse linear regression, reduced-rank multi-task learning, and convex clustering.

[1]  Jieping Ye,et al.  Efficient Methods for Overlapping Group Lasso , 2011, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[2]  Trevor Hastie,et al.  Regularization Paths for Generalized Linear Models via Coordinate Descent. , 2010, Journal of statistical software.

[3]  N. Meinshausen,et al.  Stability selection , 2008, 0809.2932.

[4]  Genevera I. Allen,et al.  Local‐aggregate modeling for big data via distributed optimization: Applications to neuroimaging , 2014, Biometrics.

[5]  Francis R. Bach,et al.  Clusterpath: an Algorithm for Clustering using Convex Fusion Penalties , 2011, ICML.

[6]  Eric C. Chi,et al.  Splitting Methods for Convex Clustering , 2013, Journal of computational and graphical statistics : a joint publication of American Statistical Association, Institute of Mathematical Statistics, Interface Foundation of North America.

[7]  Eric P. Xing,et al.  An Augmented Lagrangian Approach to Constrained MAP Inference , 2011, ICML.

[8]  Jean-Luc Starck,et al.  Sparse Solution of Underdetermined Systems of Linear Equations by Stagewise Orthogonal Matching Pursuit , 2012, IEEE Transactions on Information Theory.

[9]  Patrick Danaher,et al.  The joint graphical lasso for inverse covariance estimation across multiple classes , 2011, Journal of the Royal Statistical Society. Series B, Statistical methodology.

[10]  K. Lange,et al.  Coordinate descent algorithms for lasso penalized regression , 2008, 0803.3876.

[11]  R. Tibshirani,et al.  Sparsity and smoothness via the fused lasso , 2005 .

[12]  William W. Hager,et al.  Updating the Inverse of a Matrix , 1989, SIAM Rev..

[13]  J. Franklin,et al.  The elements of statistical learning: data mining, inference and prediction , 2005 .

[14]  Xiaoming Yuan,et al.  Sparse and low-rank matrix decomposition via alternating direction method , 2013 .

[15]  M. R. Osborne,et al.  A new approach to variable selection in least squares problems , 2000 .

[16]  Seungyeop Han,et al.  Structured Learning of Gaussian Graphical Models , 2012, NIPS.

[17]  Stephen P. Boyd,et al.  Distributed Optimization and Statistical Learning via the Alternating Direction Method of Multipliers , 2011, Found. Trends Mach. Learn..

[18]  Ming Yan,et al.  Self Equivalence of the Alternating Direction Method of Multipliers , 2014, 1407.7400.

[19]  Robert Tibshirani,et al.  The Entire Regularization Path for the Support Vector Machine , 2004, J. Mach. Learn. Res..

[20]  Stephen P. Boyd,et al.  An ADMM Algorithm for a Class of Total Variation Regularized Estimation Problems , 2012, 1203.1828.

[21]  Shiqian Ma,et al.  Alternating Direction Methods for Latent Variable Gaussian Graphical Model Selection , 2012, Neural Computation.

[22]  R. Tibshirani Regression Shrinkage and Selection via the Lasso , 1996 .

[23]  Trevor Hastie,et al.  The Elements of Statistical Learning , 2001 .

[24]  M. Yuan,et al.  Model selection and estimation in regression with grouped variables , 2006 .

[25]  Jing Lei,et al.  Fantope Projection and Selection: A near-optimal convex relaxation of sparse PCA , 2013, NIPS.

[26]  Ming Yan,et al.  Parallel and distributed sparse optimization , 2013, 2013 Asilomar Conference on Signals, Systems and Computers.

[27]  Tso-Jung Yen,et al.  Discussion on "Stability Selection" by Meinshausen and Buhlmann , 2010 .

[28]  Tom Goldstein,et al.  The Split Bregman Method for L1-Regularized Problems , 2009, SIAM J. Imaging Sci..

[29]  Jieping Ye,et al.  Tensor Completion for Estimating Missing Values in Visual Data , 2013, IEEE Trans. Pattern Anal. Mach. Intell..

[30]  L. Ljung,et al.  Just Relax and Come Clustering! : A Convexification of k-Means Clustering , 2011 .

[31]  R. Tibshirani,et al.  PATHWISE COORDINATE OPTIMIZATION , 2007, 0708.1485.

[32]  Martin J. Wainwright,et al.  Estimation of (near) low-rank matrices with noise and high-dimensional scaling , 2009, ICML.

[33]  B. He,et al.  Alternating Direction Method with Self-Adaptive Penalty Parameters for Monotone Variational Inequalities , 2000 .

[34]  R. Tibshirani,et al.  A LASSO FOR HIERARCHICAL INTERACTIONS. , 2012, Annals of statistics.

[35]  秀俊 松井,et al.  Statistics for High-Dimensional Data: Methods, Theory and Applications , 2014 .

[36]  S. Rosset,et al.  Piecewise linear regularized solution paths , 2007, 0708.2197.

[37]  Wotao Yin,et al.  A New Regularization Path for Logistic Regression via Linearized Bregman , 2012 .

[38]  Sara van de Geer,et al.  Statistics for High-Dimensional Data , 2011 .

[39]  L. Rudin,et al.  Nonlinear total variation based noise removal algorithms , 1992 .

[40]  Stephen P. Boyd,et al.  Proximal Algorithms , 2013, Found. Trends Optim..

[41]  Richard G. Baraniuk,et al.  Fast Alternating Direction Optimization Methods , 2014, SIAM J. Imaging Sci..

[42]  João M. F. Xavier,et al.  Distributed Basis Pursuit , 2010, IEEE Transactions on Signal Processing.

[43]  Julien Mairal,et al.  Convex and Network Flow Optimization for Structured Sparsity , 2011, J. Mach. Learn. Res..

[44]  Su-In Lee,et al.  Node-based learning of multiple Gaussian graphical models , 2013, J. Mach. Learn. Res..