Selection Consistency of Lasso-Based Procedures for Misspecified High-Dimensional Binary Model and Random Regressors

We consider selection of random predictors for a high-dimensional regression problem with a binary response for a general loss function. An important special case is when the binary model is semi-parametric and the response function is misspecified under a parametric model fit. When the true response coincides with a postulated parametric response for a certain value of parameter, we obtain a common framework for parametric inference. Both cases of correct specification and misspecification are covered in this contribution. Variable selection for such a scenario aims at recovering the support of the minimizer of the associated risk with large probability. We propose a two-step selection Screening-Selection (SS) procedure which consists of screening and ordering predictors by Lasso method and then selecting the subset of predictors which minimizes the Generalized Information Criterion for the corresponding nested family of models. We prove consistency of the proposed selection method under conditions that allow for a much larger number of predictors than the number of observations. For the semi-parametric case when distribution of random predictors satisfies linear regressions condition, the true and the estimated parameters are collinear and their common support can be consistently identified. This partly explains robustness of selection procedures to the response function misspecification.

[1]  Sara van de Geer,et al.  Statistics for High-Dimensional Data , 2011 .

[2]  H. Akaike Statistical predictor identification , 1970 .

[3]  P. Ruud Sufficient Conditions for the Consistency of Maximum Likelihood Estimation Despite Misspecifications of Distribution in Multinomial Discrete Choice Models , 1983 .

[4]  Yingying Fan,et al.  Tuning parameter selection in high dimensional penalized likelihood , 2013, 1605.03321.

[5]  Roman Vershynin,et al.  Introduction to the non-asymptotic analysis of random matrices , 2010, Compressed Sensing.

[6]  Shuheng Zhou Thresholded Lasso for high dimensional variable selection and statistical estimation , 2010, 1002.1583.

[7]  Q. Vuong Likelihood Ratio Tests for Model Selection and Non-Nested Hypotheses , 1989 .

[8]  Cun-Hui Zhang,et al.  Adaptive Lasso for sparse high-dimensional regression models , 2008 .

[9]  J. Mielniczuk,et al.  Active sets of predictors for misspecified logistic regression , 2017 .

[10]  H. White Maximum Likelihood Estimation of Misspecified Models , 1982 .

[11]  H. Zou,et al.  STRONG ORACLE OPTIMALITY OF FOLDED CONCAVE PENALIZED ESTIMATION. , 2012, Annals of statistics.

[12]  Trevor Hastie,et al.  Regularization Paths for Generalized Linear Models via Coordinate Descent. , 2010, Journal of statistical software.

[13]  Ker-Chau Li,et al.  On almost Linearity of Low Dimensional Projections from High Dimensional Data , 1993 .

[14]  J. Mielniczuk,et al.  Improving Lasso for model selection and prediction , 2019, Scandinavian Journal of Statistics.

[15]  G. Schwarz Estimating the Dimension of a Model , 1978 .

[16]  Jiahua Chen,et al.  Extended Bayesian information criteria for model selection with large model spaces , 2008 .

[17]  Jon A. Wellner,et al.  Weak Convergence and Empirical Processes: With Applications to Statistics , 1996 .

[18]  Jian Huang,et al.  Semismooth Newton Coordinate Descent Algorithm for Elastic-Net Penalized Huber Loss Regression and Quantile Regression , 2015, 1509.02957.

[19]  Y. Goldberg,et al.  On the robustness of the adaptive lasso to model misspecification. , 2012, Biometrika.

[20]  Yongdai Kim,et al.  Consistent model selection criteria for quadratically supported risks , 2016 .

[21]  M. Talagrand,et al.  Probability in Banach Spaces: Isoperimetry and Processes , 1991 .

[22]  Ji Zhu,et al.  Boosting as a Regularized Path to a Maximum Margin Classifier , 2004, J. Mach. Learn. Res..

[23]  J. Mielniczuk,et al.  Projections of a general binary model on a logistic regression , 2018 .

[24]  Thomas M. Cover,et al.  Elements of Information Theory , 2005 .

[25]  Francis R. Bach,et al.  Self-concordant analysis for logistic regression , 2009, ArXiv.

[26]  Ker-Chau Li,et al.  Regression Analysis Under Link Violation , 1989 .

[27]  Luc Devroye,et al.  Combinatorial methods in density estimation , 2001, Springer series in statistics.

[28]  R. Tibshirani The Lasso Problem and Uniqueness , 2012, 1206.0313.

[29]  Jan Mielniczuk,et al.  Selection Consistency of Generalized Information Criterion for Sparse Logistic Model , 2015 .

[30]  Martin J. Wainwright,et al.  A unified framework for high-dimensional analysis of $M$-estimators with decomposable regularizers , 2009, NIPS.

[31]  Zehua Chen,et al.  EXTENDED BIC FOR SMALL-n-LARGE-P SPARSE GLM , 2012 .

[32]  Jan Mielniczuk,et al.  Combined l1 and greedy l0 penalized least squares for linear model selection , 2013, J. Mach. Learn. Res..

[33]  P. Bickel,et al.  SIMULTANEOUS ANALYSIS OF LASSO AND DANTZIG SELECTOR , 2008, 0801.1095.

[34]  D. Brillinger A Generalized Linear Model With “Gaussian” Regressor Variables , 2012 .