QUIC & DIRTY: A Quadratic Approximation Approach for Dirty Statistical Models

In this paper, we develop a family of algorithms for optimizing "superposition-structured" or "dirty" statistical estimators for high-dimensional problems involving the minimization of the sum of a smooth loss function with a hybrid reg-ularization. Most of the current approaches are first-order methods, including proximal gradient or Alternating Direction Method of Multipliers (ADMM). We propose a new family of second-order methods where we approximate the loss function using quadratic approximation. The superposition structured regularizer then leads to a subproblem that can be efficiently solved by alternating minimization. We propose a general active subspace selection approach to speed up the solver by utilizing the low-dimensional structure given by the regularizers, and provide convergence guarantees for our algorithm. Empirically, we show that our approach is more than 10 times faster than state-of-the-art first-order approaches for the latent variable graphical model selection problems and multi-task learning problems when there is more than one regularizer. For these problems, our approach appears to be the first algorithm that can extend active subspace ideas to multiple regularizers.

[1]  Martin J. Wainwright,et al.  Noisy matrix decomposition via convex relaxation: Optimal rates in high dimensions , 2011, ICML.

[2]  Ali Jalali,et al.  A Dirty Model for Multi-task Learning , 2010, NIPS.

[3]  Chia-Hua Ho,et al.  An improved GLMNET for l1-regularized logistic regression , 2011, J. Mach. Learn. Res..

[4]  Michael A. Saunders,et al.  Proximal Newton-type methods for convex optimization , 2012, NIPS.

[5]  Paul Tseng,et al.  A block coordinate gradient descent method for regularized convex separable optimization and covariance selection , 2011, Math. Program..

[6]  Stephen P. Boyd,et al.  Distributed Optimization and Statistical Learning via the Alternating Direction Method of Multipliers , 2011, Found. Trends Mach. Learn..

[7]  Pradeep Ravikumar,et al.  A Divide-and-Conquer Method for Sparse Inverse Covariance Estimation , 2012, NIPS.

[8]  Pradeep Ravikumar,et al.  Greedy Algorithms for Structurally Constrained High Dimensional Problems , 2011, NIPS.

[9]  Peder A. Olsen,et al.  Nuclear Norm Minimization via Active Subspace Selection , 2014, ICML.

[10]  Volkan Cevher,et al.  An Inexact Proximal Path-Following Algorithm for Constrained Convex Minimization , 2013, SIAM J. Optim..

[11]  Katya Scheinberg,et al.  Practical inexact proximal quasi-Newton method with global complexity analysis , 2013, Mathematical Programming.

[12]  Sham M. Kakade,et al.  Robust Matrix Decomposition With Sparse Corruptions , 2011, IEEE Transactions on Information Theory.

[13]  Dusan Cakmakov,et al.  Handwritten Digit Recognition by Combining , 2005 .

[14]  Pradeep Ravikumar,et al.  Constant Nullspace Strong Convexity and Fast Convergence of Proximal Methods under High-Dimensional Settings , 2014, NIPS.

[15]  Ali Jalali,et al.  Low-Rank Matrix Recovery From Errors and Erasures , 2013, IEEE Transactions on Information Theory.

[16]  Jorge Nocedal,et al.  Newton-Like Methods for Sparse Inverse Covariance Estimation , 2012, NIPS.

[17]  Pradeep Ravikumar,et al.  Sparse inverse covariance matrix estimation using quadratic approximation , 2011, MLSLP.

[18]  Paul Tseng,et al.  A coordinate gradient descent method for nonsmooth separable minimization , 2008, Math. Program..

[19]  Yi Ma,et al.  Robust principal component analysis? , 2009, JACM.

[20]  Pradeep Ravikumar,et al.  BIG & QUIC: Sparse Inverse Covariance Estimation for a Million Variables , 2013, NIPS.

[21]  Shiqian Ma,et al.  Alternating Direction Methods for Latent Variable Gaussian Graphical Model Selection , 2012, Neural Computation.

[22]  Robert P. W. Duin,et al.  Handwritten digit recognition by combined classifiers , 1998, Kybernetika.

[23]  Emmanuel Cand Simple Bounds for Recovering Low-complexity Models , 2012 .

[24]  Sanjoy Dasgupta,et al.  A Generalization of Principal Components Analysis to the Exponential Family , 2001, NIPS.

[25]  A. Willsky,et al.  Latent variable graphical model selection via convex optimization , 2010 .

[26]  Martin J. Wainwright,et al.  A unified framework for high-dimensional analysis of $M$-estimators with decomposable regularizers , 2009, NIPS.

[27]  Katya Scheinberg,et al.  Noname manuscript No. (will be inserted by the editor) Efficient Block-coordinate Descent Algorithms for the Group Lasso , 2022 .

[28]  Kim-Chuan Toh,et al.  Solving Log-Determinant Optimization Problems by a Newton-CG Primal Proximal Point Algorithm , 2010, SIAM J. Optim..

[29]  Pablo A. Parrilo,et al.  Rank-Sparsity Incoherence for Matrix Decomposition , 2009, SIAM J. Optim..

[30]  Pradeep Ravikumar,et al.  Dirty Statistical Models , 2013, NIPS.