Closed-form Estimators for High-dimensional Generalized Linear Models

We propose a class of closed-form estimators for GLMs under high-dimensional sampling regimes. Our class of estimators is based on deriving closed-form variants of the vanilla unregularized MLE but which are (a) well-defined even under high-dimensional settings, and (b) available in closed-form. We then perform thresholding operations on this MLE variant to obtain our class of estimators. We derive a unified statistical analysis of our class of estimators, and show that it enjoys strong statistical guarantees in both parameter error as well as variable selection, that surprisingly match those of the more complex regularized GLM MLEs, even while our closed-form estimators are computationally much simpler. We derive instantiations of our class of closed-form estimators, as well as corollaries of our general theorem, for the special cases of logistic, exponential and Poisson regression models. We corroborate the surprising statistical and computational performance of our class of estimators via extensive simulations.

[1]  P. Bickel,et al.  Covariance regularization by thresholding , 2009, 0901.3079.

[2]  Pradeep Ravikumar,et al.  Graphical Models via Generalized Linear Models , 2012, NIPS.

[3]  Francis R. Bach,et al.  Self-concordant analysis for logistic regression , 2009, ArXiv.

[4]  Ambuj Tewari,et al.  Learning Exponential Families in High-Dimensions: Strong Convexity and Sparsity , 2009, AISTATS.

[5]  F. Bunea Honest variable selection in linear and logistic regression models via $\ell_1$ and $\ell_1+\ell_2$ penalization , 2008, 0808.4051.

[6]  Bin Yu,et al.  High-dimensional covariance estimation by minimizing ℓ1-penalized log-determinant divergence , 2008, 0811.3628.

[7]  Pradeep Ravikumar,et al.  Elementary Estimators for Graphical Models , 2014, NIPS.

[8]  Stephen P. Boyd,et al.  An Interior-Point Method for Large-Scale l1-Regularized Logistic Regression , 2007, J. Mach. Learn. Res..

[9]  Martin J. Wainwright,et al.  A unified framework for high-dimensional analysis of $M$-estimators with decomposable regularizers , 2009, NIPS.

[10]  Adam J. Rothman,et al.  Generalized Thresholding of Large Covariance Matrices , 2009 .

[11]  Trevor Hastie,et al.  Regularization Paths for Generalized Linear Models via Coordinate Descent. , 2010, Journal of statistical software.

[12]  Pradeep Ravikumar,et al.  Elementary Estimators for Sparse Covariance Matrices and other Structured Moments , 2014, ICML.

[13]  Shang-Hua Teng,et al.  Solving sparse, symmetric, diagonally-dominant linear systems in time O(m/sup 1.31/ , 2003, 44th Annual IEEE Symposium on Foundations of Computer Science, 2003. Proceedings..

[14]  Robert Tibshirani,et al.  Survival analysis with high-dimensional covariates , 2010, Statistical methods in medical research.

[15]  Michael I. Jordan,et al.  Graphical Models, Exponential Families, and Variational Inference , 2008, Found. Trends Mach. Learn..

[16]  Shang-Hua Teng,et al.  Nearly-Linear Time Algorithms for Preconditioning and Solving Symmetric, Diagonally Dominant Linear Systems , 2006, SIAM J. Matrix Anal. Appl..

[17]  P. Bühlmann,et al.  The group lasso for logistic regression , 2008 .

[18]  S. Geer HIGH-DIMENSIONAL GENERALIZED LINEAR MODELS AND THE LASSO , 2008, 0804.0703.

[19]  Benjamin A. Logsdon,et al.  PUMA: A Unified Framework for Penalized Multiple Regression Analysis of GWAS Data , 2013, PLoS Comput. Biol..

[20]  P. McCullagh,et al.  Generalized Linear Models , 1984 .

[21]  Shang-Hua Teng,et al.  Solving Sparse, Symmetric, Diagonally-Dominant Linear Systems in Time O(m1.31) , 2003, ArXiv.

[22]  Jakub W. Pachocki,et al.  Solving SDD linear systems in nearly mlog1/2n time , 2014, STOC.

[23]  Pradeep Ravikumar,et al.  Elementary Estimators for High-Dimensional Linear Regression , 2014, ICML.

[24]  Martin J. Wainwright,et al.  Sharp Thresholds for High-Dimensional and Noisy Sparsity Recovery Using $\ell _{1}$ -Constrained Quadratic Programming (Lasso) , 2009, IEEE Transactions on Information Theory.