论文信息 - ADMM Algorithmic Regularization Paths for Sparse Statistical Machine Learning

ADMM Algorithmic Regularization Paths for Sparse Statistical Machine Learning

Optimization approaches based on operator splitting are becoming popular for solving sparsity regularized statistical machine learning models. While many have proposed fast algorithms to solve these problems for a single regularization parameter, conspicuously less attention has been given to computing regularization paths, or solving the optimization problems over the full range of regularization parameters to obtain a sequence of sparse models. In this chapter, we aim to quickly approximate the sequence of sparse models associated with regularization paths for the purposes of statistical model selection by using the building blocks from a classical operator splitting method, the Alternating Direction Method of Multipliers (ADMM). We begin by proposing an ADMM algorithm that uses warm-starts to quickly compute the regularization path. Then, by employing approximations along this warm-starting ADMM algorithm, we propose a novel concept that we term the ADMM Algorithmic Regularization Path. Our method can quickly outline the sequence of sparse models associated with the regularization path in computational time that is often less than that of using the ADMM algorithm to solve the problem for a single regularization parameter. We demonstrate the applicability and substantial computational savings of our approach through three popular examples, sparse linear regression, reduced-rank multi-task learning, and convex clustering.

Genevera I. Allen | Eric C. Chi | Yue Hu | Yue Hu | Eric Chi

[1] Jieping Ye,et al. Efficient Methods for Overlapping Group Lasso , 2011, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[2] Trevor Hastie,et al. Regularization Paths for Generalized Linear Models via Coordinate Descent. , 2010, Journal of statistical software.

[3] N. Meinshausen,et al. Stability selection , 2008, 0809.2932.

[4] Genevera I. Allen,et al. Local‐aggregate modeling for big data via distributed optimization: Applications to neuroimaging , 2014, Biometrics.

[5] Francis R. Bach,et al. Clusterpath: an Algorithm for Clustering using Convex Fusion Penalties , 2011, ICML.

[6] Eric C. Chi,et al. Splitting Methods for Convex Clustering , 2013, Journal of computational and graphical statistics : a joint publication of American Statistical Association, Institute of Mathematical Statistics, Interface Foundation of North America.

[7] Eric P. Xing,et al. An Augmented Lagrangian Approach to Constrained MAP Inference , 2011, ICML.

[8] Jean-Luc Starck,et al. Sparse Solution of Underdetermined Systems of Linear Equations by Stagewise Orthogonal Matching Pursuit , 2012, IEEE Transactions on Information Theory.

[9] Patrick Danaher,et al. The joint graphical lasso for inverse covariance estimation across multiple classes , 2011, Journal of the Royal Statistical Society. Series B, Statistical methodology.

[10] K. Lange,et al. Coordinate descent algorithms for lasso penalized regression , 2008, 0803.3876.

[11] R. Tibshirani,et al. Sparsity and smoothness via the fused lasso , 2005 .

[12] William W. Hager,et al. Updating the Inverse of a Matrix , 1989, SIAM Rev..

[13] J. Franklin,et al. The elements of statistical learning: data mining, inference and prediction , 2005 .

[14] Xiaoming Yuan,et al. Sparse and low-rank matrix decomposition via alternating direction method , 2013 .

[15] M. R. Osborne,et al. A new approach to variable selection in least squares problems , 2000 .

[16] Seungyeop Han,et al. Structured Learning of Gaussian Graphical Models , 2012, NIPS.

[17] Stephen P. Boyd,et al. Distributed Optimization and Statistical Learning via the Alternating Direction Method of Multipliers , 2011, Found. Trends Mach. Learn..

[18] Ming Yan,et al. Self Equivalence of the Alternating Direction Method of Multipliers , 2014, 1407.7400.