论文信息 - QUIC & DIRTY: A Quadratic Approximation Approach for Dirty Statistical Models - 字舞流文

QUIC & DIRTY: A Quadratic Approximation Approach for Dirty Statistical Models

In this paper, we develop a family of algorithms for optimizing "superposition-structured" or "dirty" statistical estimators for high-dimensional problems involving the minimization of the sum of a smooth loss function with a hybrid reg-ularization. Most of the current approaches are first-order methods, including proximal gradient or Alternating Direction Method of Multipliers (ADMM). We propose a new family of second-order methods where we approximate the loss function using quadratic approximation. The superposition structured regularizer then leads to a subproblem that can be efficiently solved by alternating minimization. We propose a general active subspace selection approach to speed up the solver by utilizing the low-dimensional structure given by the regularizers, and provide convergence guarantees for our algorithm. Empirically, we show that our approach is more than 10 times faster than state-of-the-art first-order approaches for the latent variable graphical model selection problems and multi-task learning problems when there is more than one regularizer. For these problems, our approach appears to be the first algorithm that can extend active subspace ideas to multiple regularizers.

Peder A. Olsen | Pradeep Ravikumar | Inderjit S. Dhillon | Cho-Jui Hsieh | Stephen Becker | Cho-Jui Hsieh | I. Dhillon | Pradeep Ravikumar | P. Olsen | Stephen Becker

[1] Martin J. Wainwright,et al. Noisy matrix decomposition via convex relaxation: Optimal rates in high dimensions , 2011, ICML.

[2] Ali Jalali,et al. A Dirty Model for Multi-task Learning , 2010, NIPS.

[3] Chia-Hua Ho,et al. An improved GLMNET for l1-regularized logistic regression , 2011, J. Mach. Learn. Res..

[4] Michael A. Saunders,et al. Proximal Newton-type methods for convex optimization , 2012, NIPS.

[5] Paul Tseng,et al. A block coordinate gradient descent method for regularized convex separable optimization and covariance selection , 2011, Math. Program..

[6] Stephen P. Boyd,et al. Distributed Optimization and Statistical Learning via the Alternating Direction Method of Multipliers , 2011, Found. Trends Mach. Learn..

[7] Pradeep Ravikumar,et al. A Divide-and-Conquer Method for Sparse Inverse Covariance Estimation , 2012, NIPS.

[8] Pradeep Ravikumar,et al. Greedy Algorithms for Structurally Constrained High Dimensional Problems , 2011, NIPS.

[9] Peder A. Olsen,et al. Nuclear Norm Minimization via Active Subspace Selection , 2014, ICML.

[10] Volkan Cevher,et al. An Inexact Proximal Path-Following Algorithm for Constrained Convex Minimization , 2013, SIAM J. Optim..

[11] Katya Scheinberg,et al. Practical inexact proximal quasi-Newton method with global complexity analysis , 2013, Mathematical Programming.

[12] Sham M. Kakade,et al. Robust Matrix Decomposition With Sparse Corruptions , 2011, IEEE Transactions on Information Theory.

[13] Dusan Cakmakov,et al. Handwritten Digit Recognition by Combining , 2005 .

[14] Pradeep Ravikumar,et al. Constant Nullspace Strong Convexity and Fast Convergence of Proximal Methods under High-Dimensional Settings , 2014, NIPS.

[15] Ali Jalali,et al. Low-Rank Matrix Recovery From Errors and Erasures , 2013, IEEE Transactions on Information Theory.

[16] Jorge Nocedal,et al. Newton-Like Methods for Sparse Inverse Covariance Estimation , 2012, NIPS.

[17] Pradeep Ravikumar,et al. Sparse inverse covariance matrix estimation using quadratic approximation , 2011, MLSLP.

[18] Paul Tseng,et al. A coordinate gradient descent method for nonsmooth separable minimization , 2008, Math. Program..

[19] Yi Ma,et al. Robust principal component analysis? , 2009, JACM.

[20] Pradeep Ravikumar,et al. BIG & QUIC: Sparse Inverse Covariance Estimation for a Million Variables , 2013, NIPS.

[21] Shiqian Ma,et al. Alternating Direction Methods for Latent Variable Gaussian Graphical Model Selection , 2012, Neural Computation.

[22] Robert P. W. Duin,et al. Handwritten digit recognition by combined classifiers , 1998, Kybernetika.

[23] Emmanuel Cand. Simple Bounds for Recovering Low-complexity Models , 2012 .

[24] Sanjoy Dasgupta,et al. A Generalization of Principal Components Analysis to the Exponential Family , 2001, NIPS.

[25] A. Willsky,et al. Latent variable graphical model selection via convex optimization , 2010 .

[26] Martin J. Wainwright,et al. A unified framework for high-dimensional analysis of $M$-estimators with decomposable regularizers , 2009, NIPS.

[27] Katya Scheinberg,et al. Noname manuscript No. (will be inserted by the editor) Efficient Block-coordinate Descent Algorithms for the Group Lasso , 2022 .

[28] Kim-Chuan Toh,et al. Solving Log-Determinant Optimization Problems by a Newton-CG Primal Proximal Point Algorithm , 2010, SIAM J. Optim..

[29] Pablo A. Parrilo,et al. Rank-Sparsity Incoherence for Matrix Decomposition , 2009, SIAM J. Optim..

[30] Pradeep Ravikumar,et al. Dirty Statistical Models , 2013, NIPS.