A Robust Partial Correlation-based Screening Approach

As a computationally fast and working efficient tool, sure independence screening has received much attention in solving ultrahigh dimensional problems. This paper contributes two robust sure screening approaches that simultaneously take into account heteroscedasticity, outliers, heavy-tailed distribution, continuous or discrete response, and confounding effect, from the perspective of model-free. First, we define a robust correlation measure only using two random indicators, and introduce a screener using that correlation. Second, we propose a robust partial correlation-based screening approach when an exposure variable is available. To remove the confounding effect of the exposure on both response and each covariate, we use a nonparametric regression with some specified loss function. More specifically, a robust correlation-based screening method (RC-SIS) and a robust partial correlation-based screening framework (RPC-SIS) including two concrete screeners: RPC-SIS(L2) and RPC-SIS(L1), are formed. Third, we establish sure screening properties of RC-SIS for which the response variable can be either continuous or discrete, as well as those of RPC-SIS(L2) and RPC-SIS(L1) under some regularity conditions. Our approaches are essentially nonparametric, and perform robustly for both the response and the covariates. Finally, extensive simulation studies and two applications are carried out to demonstrate the superiority of our proposed approaches.

[1]  Runze Li,et al.  Model-Free Forward Screening Via Cumulative Divergence , 2019, Journal of the American Statistical Association.

[2]  Lan Wang,et al.  Quantile-adaptive model-free variable screening for high-dimensional heterogeneous data , 2013, 1304.2186.

[3]  Keith Knight,et al.  Limiting distributions for $L\sb 1$ regression estimators under general conditions , 1998 .

[4]  Thomas L Casavant,et al.  Homozygosity mapping with SNP arrays identifies TRIM32, an E3 ubiquitin ligase, as a Bardet-Biedl syndrome gene (BBS11). , 2006, Proceedings of the National Academy of Sciences of the United States of America.

[5]  C. J. Stone,et al.  Optimal Global Rates of Convergence for Nonparametric Regression , 1982 .

[6]  Chenlei Leng,et al.  Shrinkage tuning parameter selection with a diverging number of parameters , 2008 .

[7]  P. Fryzlewicz,et al.  High dimensional variable selection via tilting , 2012, 1611.08640.

[8]  T. Stamey,et al.  Prostate specific antigen in the diagnosis and treatment of adenocarcinoma of the prostate. II. Radical prostatectomy treated patients. , 1989, The Journal of urology.

[9]  Jianqing Fan,et al.  Local polynomial modelling and its applications , 1994 .

[10]  P. Massart The Tight Constant in the Dvoretzky-Kiefer-Wolfowitz Inequality , 1990 .

[11]  Jialiang Li,et al.  Conditional quantile correlation learning for ultrahigh dimensional varying coefficient models and its application in survival analysis , 2018 .

[12]  Igor Jurisica,et al.  Gene expression–based survival prediction in lung adenocarcinoma: a multi-site, blinded validation study , 2008, Nature Medicine.

[13]  Jon A. Wellner,et al.  Weak Convergence and Empirical Processes: With Applications to Statistics , 1996 .

[14]  Yichao Wu,et al.  MARGINAL EMPIRICAL LIKELIHOOD AND SURE INDEPENDENCE FEATURE SCREENING. , 2013, Annals of statistics.

[15]  Runze Li,et al.  Model-Free Feature Screening for Ultrahigh-Dimensional Data , 2011, Journal of the American Statistical Association.

[16]  Jun Zhang,et al.  Robust rank correlation based screening , 2010, 1012.4255.

[17]  Jianqing Fan,et al.  Sure independence screening in generalized linear models with NP-dimensionality , 2009, The Annals of Statistics.

[18]  Yang Feng,et al.  Nonparametric Independence Screening in Sparse Ultra-High-Dimensional Additive Models , 2009, Journal of the American Statistical Association.

[19]  Runze Li,et al.  Feature Screening via Distance Correlation Learning , 2012, Journal of the American Statistical Association.

[20]  Cun-Hui Zhang Nearly unbiased variable selection under minimax concave penalty , 2010, 1002.4734.

[21]  Runze Li,et al.  Variable Selection via Partial Correlation. , 2017, Statistica Sinica.

[22]  Qi Zheng,et al.  Survival impact index and ultrahigh‐dimensional model‐free screening with survival outcomes , 2016, Biometrics.

[23]  P. Massart,et al.  About the constants in Talagrand's concentration inequalities for empirical processes , 2000 .

[24]  Jianqing Fan,et al.  Sure independence screening for ultrahigh dimensional feature space , 2006, math/0612857.

[25]  Glenn Stone Statistics for High‐Dimensional Data: Methods, Theory and Applications. By Peter Buhlmann and Sara van de Geer. Springer, Berlin, Heidelberg. 2011. xvii+556 pages. €104.99 (hardback). ISBN 978‐3‐642‐20191‐2. , 2013 .

[26]  C. R. Deboor,et al.  A practical guide to splines , 1978 .

[27]  V. Sheffield,et al.  Regulation of gene expression in the mammalian eye and its relevance to eye disease , 2006, Proceedings of the National Academy of Sciences.

[28]  Runze Li,et al.  Quantile Regression for Analyzing Heterogeneity in Ultra-High Dimension , 2012, Journal of the American Statistical Association.

[29]  Runze Li,et al.  Variable Screening via Quantile Partial Correlation , 2017, Journal of the American Statistical Association.

[30]  Ulf Schepsmeier,et al.  Derivatives and Fisher information of bivariate copulas , 2014 .

[31]  R. Tibshirani Regression Shrinkage and Selection via the Lasso , 1996 .

[32]  Jianqing Fan,et al.  Conditional Sure Independence Screening , 2012, Journal of the American Statistical Association.

[33]  Jianqing Fan,et al.  Factor-Adjusted Regularized Model Selection , 2016, Journal of econometrics.

[34]  Jianqing Fan,et al.  Variable Selection via Nonconcave Penalized Likelihood and its Oracle Properties , 2001 .

[35]  Runze Li,et al.  FEATURE SCREENING FOR TIME-VARYING COEFFICIENT MODELS WITH ULTRAHIGH DIMENSIONAL LONGITUDINAL DATA. , 2016, The annals of applied statistics.

[36]  M. Talagrand,et al.  Probability in Banach Spaces: Isoperimetry and Processes , 1991 .

[37]  Yichao Wu,et al.  LOCAL INDEPENDENCE FEATURE SCREENING FOR NONPARAMETRIC AND SEMIPARAMETRIC MODELS BY MARGINAL EMPIRICAL LIKELIHOOD. , 2015, Annals of statistics.

[38]  Jialiang Li,et al.  Copula-based Partial Correlation Screening: a Joint and Robust Approach , 2018, Statistica Sinica.

[39]  Runze Li,et al.  Feature Selection for Varying Coefficient Models With Ultrahigh-Dimensional Covariates , 2014, Journal of the American Statistical Association.

[40]  Runze Li,et al.  Model-Free Feature Screening for Ultrahigh Dimensional Discriminant Analysis , 2015, Journal of the American Statistical Association.

[41]  Patrick Warnat,et al.  Customized oligonucleotide microarray gene expression-based classification of neuroblastoma patients outperforms current clinical risk stratification. , 2006, Journal of clinical oncology : official journal of the American Society of Clinical Oncology.

[42]  Xu Han,et al.  NONPARAMETRIC SCREENING UNDER CONDITIONAL STRICTLY CONVEX LOSS FOR ULTRAHIGH DIMENSIONAL SPARSE DATA , 2018 .

[43]  Yang Li,et al.  Quantile Correlations and Quantile Autoregressive Modeling , 2012, 1209.6487.