Weak signal identification and inference in penalized likelihood models for categorical responses

Penalized likelihood models are widely used to simultaneously select variables and estimate model parameters. However, the existence of weak signals can lead to inaccurate variable selection, biased parameter estimation, and invalid inference. Thus, identifying weak signals accurately and making valid inferences are crucial in penalized likelihood models. We develop a unified approach to identify weak signals and make inferences in penalized likelihood models, including the special case when the responses are categorical. To identify weak signals, we use the estimated selection probability of each covariate as a measure of the signal strength and formulate a signal identification criterion. To construct confidence intervals, we propose a two-step inference procedure. Extensive simulation studies show that the proposed procedure outperforms several existing methods. We illustrate the proposed method by applying it to the Practice Fusion diabetes data set.

[1]  Orawan Reangsephet,et al.  Weak Signals in High-Dimensional Logistic Regression Models , 2019, Advances in Intelligent Systems and Computing.

[2]  Hyokyoung G Hong,et al.  Weak signals in high-dimension regression: detection, estimation and prediction. , 2019, Applied stochastic models in business and industry.

[3]  M. Papagianni,et al.  Herpes Zoster and Diabetes Mellitus: A Review , 2018, Diabetes Therapy.

[4]  Jinzhu Jia,et al.  Elastic-net Regularized High-dimensional Negative Binomial Regression: Consistency and Weak Signals Detection , 2017, 1712.03412.

[5]  Xin Xu,et al.  A Bootstrap Lasso + Partial Ridge Method to Construct Confidence Intervals for Parameters in High-dimensional Sparse Linear Models , 2017, Statistica Sinica.

[6]  Jinzhu Jia,et al.  Sparse Poisson regression with penalized weighted score function , 2017, Electronic Journal of Statistics.

[7]  A. Qu,et al.  Weak signal identification and inference in penalized model selection , 2016, 1611.04638.

[8]  Peter Bühlmann,et al.  High-dimensional simultaneous inference with the bootstrap , 2016, 1606.03940.

[9]  Han Liu,et al.  A Unified Theory of Confidence Regions and Testing for High-Dimensional Estimating Equations , 2015, Statistical Science.

[10]  B. Efron Estimation and Accuracy After Model Selection , 2014, Journal of the American Statistical Association.

[11]  Adel Javanmard,et al.  Confidence intervals and hypothesis testing for high-dimensional regression , 2013 .

[12]  S. Geer,et al.  On asymptotically optimal confidence regions and tests for high-dimensional models , 2013, 1303.0518.

[13]  Qi Zhang,et al.  Optimality of graphlet screening in high dimensional variable selection , 2012, J. Mach. Learn. Res..

[14]  Lu Tian,et al.  A Perturbation Method for Inference on Regularized Regression Estimates , 2011, Journal of the American Statistical Association.

[15]  Cun-Hui Zhang,et al.  Confidence intervals for low dimensional parameters in high dimensional linear models , 2011, 1110.2563.

[16]  Tong Zhang Multi-stage Convex Relaxation for Feature Selection , 2011, 1106.0565.

[17]  R. Tibshirani,et al.  (37) Medications as Independent Risk Factors of Delirium in Patients With COVID-19: A Retrospective Study , 2018, Journal of the Academy of Consultation-Liaison Psychiatry.

[18]  S. Thompson,et al.  Bias in causal estimates from Mendelian randomization studies with weak instruments , 2011, Statistics in medicine.

[19]  T. Cai,et al.  A Constrained ℓ1 Minimization Approach to Sparse Precision Matrix Estimation , 2011, 1102.2233.

[20]  Cun-Hui Zhang Nearly unbiased variable selection under minimax concave penalty , 2010, 1002.4734.

[21]  S. Geer,et al.  The adaptive and the thresholded Lasso for potentially misspecified models (and a lower bound for the Lasso) , 2011 .

[22]  Trevor J. Hastie,et al.  Genome-wide association analysis by lasso penalized logistic regression , 2009, Bioinform..

[23]  H. Zou,et al.  One-step Sparse Estimates in Nonconcave Penalized Likelihood Models. , 2008, Annals of statistics.

[24]  Jian Huang,et al.  Asymptotic oracle properties of SCAD-penalized least squares estimators , 2007, 0709.0863.

[25]  Chenlei Leng,et al.  Unified LASSO Estimation by Least Squares Approximation , 2007 .

[26]  H. Zou The Adaptive Lasso and Its Oracle Properties , 2006 .

[27]  Peng Zhao,et al.  On Model Selection Consistency of Lasso , 2006, J. Mach. Learn. Res..

[28]  M. Yuan,et al.  Model selection and estimation in regression with grouped variables , 2006 .

[29]  P. Eilers,et al.  Bayesian proportional hazards model with time‐varying regression coefficients: a penalized Poisson regression approach , 2005, Statistics in medicine.

[30]  Norman R. Swanson,et al.  Consistent Estimation with a Large Number of Weak Instruments , 2005 .

[31]  H. Zou,et al.  Regularization and variable selection via the elastic net , 2005 .

[32]  R. Tibshirani,et al.  Sparsity and smoothness via the fused lasso , 2005 .

[33]  T. Hastie,et al.  Classification of gene microarrays by penalized logistic regression. , 2004, Biostatistics.

[34]  Jiaying Gu,et al.  Weak‐instrument robust inference for two‐sample instrumental variables regression , 2018 .

[35]  Yongli Zhang Recovery of weak signal in high dimensional linear regression by data perturbation , 2017 .

[36]  Mee Young Park,et al.  Penalized logistic regression for detecting gene interactions. , 2008, Biostatistics.

[37]  M. Kenward,et al.  An Introduction to the Bootstrap , 2007 .

[38]  R. Tibshirani Regression Shrinkage and Selection via the Lasso , 1996 .