Penalized Estimating Functions and Variable Selection in Semiparametric Regression Models

We propose a general strategy for variable selection in semiparametric regression models by penalizing appropriate estimating functions. Important applications include semiparametric linear regression with censored responses and semiparametric regression with missing predictors. Unlike the existing penalized maximum likelihood estimators, the proposed penalized estimating functions may not pertain to the derivatives of any objective functions and may be discrete in the regression coefficients. We establish a general asymptotic theory for penalized estimating functions and present suitable numerical algorithms to implement the proposed estimators. In addition, we develop a resampling technique to estimate the variances of the estimated regression coefficients when the asymptotic variances cannot be evaluated directly. Simulation studies demonstrate that the proposed methods perform well in variable selection and variance estimation. We illustrate our methods using data from the Paul Coverdell Stroke Registry.

[1]  David R. Cox,et al.  Regression models and life tables (with discussion , 1972 .

[2]  R. Prentice Linear rank tests with right censored data , 1978 .

[3]  I. James,et al.  Linear regression with censored data , 1979 .

[4]  J. Kalbfleisch,et al.  The Statistical Analysis of Failure Time Data , 1980 .

[5]  J. Kalbfleisch,et al.  The Statistical Analysis of Failure Time Data , 1980 .

[6]  Walter R. Young,et al.  The Statistical Analysis of Failure Time Data , 1981 .

[7]  G. Wahba A Comparison of GCV and GML for Choosing the Smoothing Parameter in the Generalized Spline Smoothing Problem , 1985 .

[8]  S. Zeger,et al.  Longitudinal data analysis using generalized linear models , 1986 .

[9]  A. Tsiatis Estimating Regression Parameters Using Linear Rank Tests for Censored Data , 1990 .

[10]  Zhiliang Ying,et al.  Linear regression analysis of censored survival data based on rank tests , 1990 .

[11]  Ya'acov Ritov,et al.  Estimation in a Linear Regression Model with Censored Data , 1990 .

[12]  Zhiliang Ying,et al.  Large Sample Theory of a Modified Buckley-James Estimator for Regression Analysis with Censored Data , 1991 .

[13]  Zhiliang Ying,et al.  Rank Regression Methods for Left-Truncated and Right-Censored Data , 1991 .

[14]  Lee-Jen Wei,et al.  Linear Regression Analysis for Multivariate Failure Time Observations , 1992 .

[15]  J. Friedman,et al.  A Statistical View of Some Chemometrics Regression Tools , 1993 .

[16]  Zhiliang Ying,et al.  A Large Sample Study of Rank Estimation for Censored Regression Data , 1993 .

[17]  J. Robins,et al.  Estimation of Regression Coefficients When Some Regressors are not Always Observed , 1994 .

[18]  Jon A. Wellner,et al.  Weak Convergence and Empirical Processes: With Applications to Statistics , 1996 .

[19]  R. Tibshirani Regression Shrinkage and Selection via the Lasso , 1996 .

[20]  R. Tibshirani The lasso method for variable selection in the Cox model. , 1997, Statistics in medicine.

[21]  Wenjiang J. Fu,et al.  Asymptotics for lasso-type estimators , 2000 .

[22]  Jianqing Fan,et al.  Variable Selection via Nonconcave Penalized Likelihood and its Oracle Properties , 2001 .

[23]  Zhiliang Ying,et al.  Semiparametric and Nonparametric Regression Analysis of Longitudinal Data , 2001 .

[24]  Jianqing Fan,et al.  Variable Selection for Cox's proportional Hazards Model and Frailty Model , 2002 .

[25]  O. Borgan The Statistical Analysis of Failure Time Data (2nd Ed.). John D. Kalbfleisch and Ross L. Prentice , 2003 .

[26]  Wenjiang J. Fu,et al.  Penalized Estimating Equations , 2003, Biometrics.

[27]  Jianqing Fan,et al.  New Estimation and Model Selection Procedures for Semiparametric Modeling in Longitudinal Data Analysis , 2004 .

[28]  D. Hunter,et al.  Variable Selection using MM Algorithms. , 2005, Annals of statistics.

[29]  Runze Li,et al.  Variable selection for multivariate failure time data. , 2005, Biometrika.

[30]  H. Zou,et al.  Regularization and variable selection via the elastic net , 2005 .

[31]  Joseph P Broderick,et al.  Acute Stroke Care in the US: Results from 4 Pilot Prototypes of the Paul Coverdell National Acute Stroke Registry , 2005, Stroke.

[32]  A. Tsiatis Semiparametric Theory and Missing Data , 2006 .

[33]  Donald Hedeker,et al.  Longitudinal Data Analysis , 2006 .

[34]  H. Zou The Adaptive Lasso and Its Oracle Properties , 2006 .

[35]  Brent A. Johnson Variable selection in semiparametric linear regression with censored data , 2008 .

[36]  D.,et al.  Regression Models and Life-Tables , 2022 .