Variable Selection for Panel Count Data via Non‐Concave Penalized Estimating Function

Variable selection is an important issue in all regression analyses, and in this paper we discuss this in the context of regression analysis of panel count data. Panel count data often occur in long-term studies that concern occurrence rate of a recurrent event, and their analysis has recently attracted a great deal of attention. However, there does not seem to exist any established approach for variable selection with respect to panel count data. For the problem, we adopt the idea behind the non-concave penalized likelihood approach and develop a non-concave penalized estimating function approach. The proposed methodology selects variables and estimates regres- sion coefficients simultaneously, and an algorithm is presented for this process. We show that the proposed procedure performs as well as the oracle procedure in that it yields the estimates as if the correct submodel were known. Simulation studies are conducted for assessing the performance of the proposed approach and suggest that it works well for practical situations. An illustrative example from a cancer study is provided.

[1]  A. Antoniadis Wavelets in statistics: A review , 1997 .

[2]  Ying Zhang,et al.  Analysing panel count data with informative observation times. , 2006, Biometrika.

[3]  Runze Li,et al.  Variable Selection in Semiparametric Regression Modeling. , 2008, Annals of statistics.

[4]  John M. Lachin,et al.  Analysis of Recurrent Events: Nonparametric Methods for Random-Interval Count Data , 1988 .

[5]  Jon A. Wellner,et al.  Two estimators of the mean of a counting process with panel count data , 2000 .

[6]  Ying Zhang,et al.  A semiparametric pseudolikelihood estimation method for panel count data , 2002 .

[7]  Brent A. Johnson,et al.  Penalized Estimating Functions and Variable Selection in Semiparametric Regression Models , 2008, Journal of the American Statistical Association.

[8]  J. Kalbfleisch,et al.  The Analysis of Panel Data under a Markov Assumption , 1985 .

[9]  G. Schwarz Estimating the Dimension of a Model , 1978 .

[10]  Jianqing Fan,et al.  Comments on «Wavelets in statistics: A review» by A. Antoniadis , 1997 .

[11]  Jianguo Sun,et al.  The Statistical Analysis of Interval-censored Failure Time Data , 2006 .

[12]  Xin He Comments on: Nonparametric inference based on panel count data , 2011 .

[13]  Jianqing Fan,et al.  Variable Selection via Nonconcave Penalized Likelihood and its Oracle Properties , 2001 .

[14]  Trevor Hastie,et al.  The Elements of Statistical Learning , 2001 .

[15]  Bin Nan,et al.  Doubly Penalized Buckley–James Method for Survival Data with High‐Dimensional Covariates , 2008, Biometrics.

[16]  Runze Li,et al.  Tuning parameter selectors for the smoothly clipped absolute deviation method. , 2007, Biometrika.

[17]  L. Breiman Heuristics of instability and stabilization in model selection , 1996 .

[18]  H. Akaike Maximum likelihood identification of Gaussian autoregressive moving average models , 1973 .

[19]  Xingwei Tong,et al.  Regression Analysis of Panel Count Data with Dependent Observation Times , 2007, Biometrics.

[20]  Runze Li,et al.  Quadratic Inference Functions for Varying‐Coefficient Models with Longitudinal Data , 2006, Biometrics.

[21]  Brent A. Johnson Variable selection in semiparametric linear regression with censored data , 2008 .

[22]  Jian Huang,et al.  Estimation of the mean function with panel count data using monotone polynomial splines , 2007 .

[23]  Jianqing Fan,et al.  New Estimation and Model Selection Procedures for Semiparametric Modeling in Longitudinal Data Analysis , 2004 .

[24]  Hao Helen Zhang,et al.  Adaptive Lasso for Cox's proportional hazards model , 2007 .

[25]  R. Tibshirani Regression Shrinkage and Selection via the Lasso , 1996 .

[26]  Wenjiang J. Fu,et al.  Penalized Estimating Equations , 2003, Biometrics.

[27]  Jianqing Fan,et al.  Nonconcave penalized likelihood with a diverging number of parameters , 2004, math/0406466.

[28]  D. Hunter,et al.  Variable Selection using MM Algorithms. , 2005, Annals of statistics.

[29]  H. Zou The Adaptive Lasso and Its Oracle Properties , 2006 .

[30]  R. Tibshirani The lasso method for variable selection in the Cox model. , 1997, Statistics in medicine.

[31]  Jianguo Sun,et al.  A nonparametric test for panel count data , 2003 .

[32]  C. Mallows More comments on C p , 1995 .

[33]  Chenlei Leng,et al.  Unified LASSO Estimation by Least Squares Approximation , 2007 .

[34]  L. J. Wei,et al.  Regression analysis of multivariate incomplete failure time data by modeling marginal distributions , 1989 .

[35]  Wenjiang J. Fu,et al.  Asymptotics for lasso-type estimators , 2000 .

[36]  D P Byar,et al.  The Veterans Administration Study of Chemoprophylaxis for Recurrent Stage I Bladder Tumours: Comparisons of Placebo, Pyridoxine and Topical Thiotepa , 1980 .

[37]  N. Meinshausen,et al.  High-dimensional graphs and variable selection with the Lasso , 2006, math/0608017.

[38]  D. F. Andrews,et al.  Data : a collection of problems from many fields for the student and research worker , 1985 .

[39]  Lee-Jen Wei,et al.  Regression analysis of panel count data with covariate‐dependent observation and censoring times , 2000 .

[40]  C. L. Mallows Some comments on C_p , 1973 .

[41]  Richard J Cook,et al.  Regression analysis of multivariate panel count data. , 2008, Biostatistics.

[42]  Runze Li,et al.  Variable selection for multivariate failure time data. , 2005, Biometrika.