Fast Restoration Dantzig Selection for Censored Data

Dimension reduction, modeland variable selectionhavebecome ubiquitous concepts in modern statisticalscience. Thispaperis concernedwith simultaneous estimation and variable selectionin thelinear model orleast-squares setup,principlebuildingblocks of complete-data model selection techniques. In contrast to the complete-data setup, we considerthecommonsituationwheretheoutcomesmayberight-censored. Threemost common estimatorsin this settinginclude the rank-based estimator, theBuckley-James estimator, andinverse-probability weighted(IPW) estimator. TheBuckley-James and IPW estimators are popular among applied statisticians because the estimators are based on simple missing data principles and they require no knowledge of linear programming. However, both estimators suffer difficulties. The IPW estimator is particularly inefficient whiletheBuckley-James estimating functionpossesses multiple roots in large and small samples and its iterative estimation procedure may result in infinite oscillation. Recently, authors have rediscovered that infinite oscillation can be avoided with consistent initial value. The goal of this paper is to offer a new Buckley-Jamestype estimator for high dimensional, sparse regression in the accelerated failure time model. Theideaisto start with agoodinitial value and replacelinear regression with Dantzig selection (DS). Compared with competing DS-type estimators for censored survival data, our method is much faster because it performs the DS optimization once, not iteratively. We illustrate the utility of this method through simulation studies and application to three real data sets.

[1]  R. Tibshirani Regression Shrinkage and Selection via the Lasso , 1996 .

[2]  M. Woodbury A missing information principle: theory and applications , 1972 .

[3]  Susmita Datta,et al.  Predicting Patient Survival from Microarray Data by Accelerated Failure Time Modeling Using Partial Least Squares and LASSO , 2007, Biometrics.

[4]  Yoav Freund,et al.  Experiments with a New Boosting Algorithm , 1996, ICML.

[5]  I. James,et al.  Linear regression with censored data , 1979 .

[6]  Z. Ying,et al.  On least-squares regression with censored data , 2006 .

[7]  Jianqing Fan,et al.  Local polynomial modelling and its applications , 1994 .

[8]  Gareth M. James,et al.  A generalized Dantzig selector with shrinkage tuning , 2009 .

[9]  Peter Buhlmann,et al.  BOOSTING ALGORITHMS: REGULARIZATION, PREDICTION AND MODEL FITTING , 2007, 0804.2752.

[10]  R. Tibshirani The lasso method for variable selection in the Cox model. , 1997, Statistics in medicine.

[11]  Chenlei Leng,et al.  Unified LASSO Estimation by Least Squares Approximation , 2007 .

[12]  Christopher J. Merz,et al.  UCI Repository of Machine Learning Databases , 1996 .

[13]  E. Candès,et al.  The Dantzig selector: Statistical estimation when P is much larger than n , 2005, math/0506081.

[14]  Yoav Freund,et al.  A decision-theoretic generalization of on-line learning and an application to boosting , 1997, EuroCOLT.

[15]  J. Kalbfleisch,et al.  The Statistical Analysis of Failure Time Data , 1980 .

[16]  Brent A. Johnson,et al.  Penalized Estimating Functions and Variable Selection in Semiparametric Regression Models , 2008, Journal of the American Statistical Association.

[17]  J. Friedman Special Invited Paper-Additive logistic regression: A statistical view of boosting , 2000 .

[18]  E. Gehan A GENERALIZED WILCOXON TEST FOR COMPARING ARBITRARILY SINGLY-CENSORED SAMPLES. , 1965, Biometrika.

[19]  A. Tsiatis Semiparametric Theory and Missing Data , 2006 .

[20]  D. Cox,et al.  Analysis of Survival Data. , 1985 .

[21]  Ya'acov Ritov,et al.  Estimation in a Linear Regression Model with Censored Data , 1990 .

[22]  J. Friedman Greedy function approximation: A gradient boosting machine. , 2001 .

[23]  Brent A. Johnson Variable selection in semiparametric linear regression with censored data , 2008 .

[24]  B. Efron The two sample problem with censored data , 1967 .