Robust feature screening for elliptical copula regression model

In this paper, we propose a flexible semi-parametric regression model called Elliptical Copula Regression (ECR) model, which covers a large class of linear and nonlinear regression models such as the additive regression model and the linear transformation model. In addition, ECR model can capture the heavy-tail characteristic and tail dependence between variables, thus it can be widely applied in many areas such as econometrics and finance. We mainly focus on the feature screening problem for ECR model in an ultra-high dimensional setting here. We propose a robust feature screening procedure for ECR model, in which two types of correlation coefficients are involved: Kendall’s τ correlation and canonical correlation. Theoretical analysis shows that the procedure enjoys sure screening property, i.e., with probability tending to 1, the feature screening procedure selects out all important variables and substantially reduces the dimensionality to a moderate size against the sample size. Thorough numerical studies are conducted to illustrate its advantage over existing feature screening methods. At last, the proposed procedure is applied to a gene-expression real data set to show its empirical usefulness.

[1]  Jianqing Fan,et al.  Sure Independence Screening , 2018 .

[2]  Terence Tao,et al.  The Dantzig selector: Statistical estimation when P is much larger than n , 2005, math/0506081.

[3]  Terence P. Speed,et al.  A comparison of normalization methods for high density oligonucleotide array data based on variance and bias , 2003, Bioinform..

[4]  S. Kotz,et al.  The Meta-elliptical Distributions with Given Marginals , 2002 .

[5]  Mladen Kolar,et al.  ROCKET: Robust Confidence Intervals via Kendall's Tau for Transelliptical Graphical Models , 2015, The Annals of Statistics.

[6]  Yuan Yao,et al.  Sure screening by ranking the canonical correlations , 2017 .

[7]  A. M. Wesselman,et al.  Elliptical regression operationalized , 1987 .

[8]  Runze Li,et al.  Feature Screening via Distance Correlation Learning , 2012, Journal of the American Statistical Association.

[9]  Jianqing Fan,et al.  Variable Selection via Nonconcave Penalized Likelihood and its Oracle Properties , 2001 .

[10]  V. Sheffield,et al.  Regulation of gene expression in the mammalian eye and its relevance to eye disease , 2006, Proceedings of the National Academy of Sciences.

[11]  Jun Zhang,et al.  Robust rank correlation based screening , 2010, 1012.4255.

[12]  S. Geer,et al.  High-dimensional additive modeling , 2008, 0806.4115.

[13]  A. McNeil,et al.  KENDALL'S TAU FOR ELLIPTICAL DISTRIBUTIONS ∗ , 2003 .

[14]  Cun-Hui Zhang,et al.  Scaled sparse linear regression , 2011, 1104.4595.

[15]  Liu Jingyuan,et al.  A selective overview of feature screening for ultrahigh-dimensional data , 2015, Science China Mathematics.

[16]  Rafael A Irizarry,et al.  Exploration, normalization, and summaries of high density oligonucleotide array probe level data. , 2003, Biostatistics.

[17]  Jianqing Fan,et al.  Nonparametric Independence Screening in Sparse Ultra-High-Dimensional Varying Coefficient Models , 2014, Journal of the American Statistical Association.

[18]  Larry A. Wasserman,et al.  High Dimensional Semiparametric Gaussian Copula Graphical Models. , 2012, ICML 2012.

[19]  Runze Li,et al.  Model-Free Feature Screening for Ultrahigh-Dimensional Data , 2011, Journal of the American Statistical Association.

[20]  Hui Zou,et al.  Multitask Quantile Regression Under the Transnormal Model , 2016, Journal of the American Statistical Association.

[21]  Runze Li,et al.  Model-Free Feature Screening for Ultrahigh Dimensional Discriminant Analysis , 2015, Journal of the American Statistical Association.

[22]  H. Zou The Adaptive Lasso and Its Oracle Properties , 2006 .

[23]  Han Liu,et al.  Scale-Invariant Sparse PCA on High-Dimensional Meta-Elliptical Data , 2014, Journal of the American Statistical Association.

[24]  Yang Feng,et al.  Nonparametric Independence Screening in Sparse Ultra-High-Dimensional Additive Models , 2009, Journal of the American Statistical Association.

[25]  Cun-Hui Zhang Nearly unbiased variable selection under minimax concave penalty , 2010, 1002.4734.

[26]  Linjun Zhang,et al.  High-Dimensional Gaussian Copula Regression: Adaptive Estimation and Statistical Inference , 2015 .

[27]  J. Horowitz,et al.  VARIABLE SELECTION IN NONPARAMETRIC ADDITIVE MODELS. , 2010, Annals of statistics.

[28]  Jianqing Fan,et al.  Sure independence screening for ultrahigh dimensional feature space , 2006, math/0612857.

[29]  M. Yuan,et al.  Model selection and estimation in regression with grouped variables , 2006 .

[30]  Lixing Zhu,et al.  NONCONCAVE PENALIZED M-ESTIMATION WITH A DIVERGING NUMBER OF PARAMETERS , 2011 .

[31]  R. Tibshirani Regression Shrinkage and Selection via the Lasso , 1996 .

[32]  Thomas L Casavant,et al.  Homozygosity mapping with SNP arrays identifies TRIM32, an E3 ubiquitin ligase, as a Bardet-Biedl syndrome gene (BBS11). , 2006, Proceedings of the National Academy of Sciences of the United States of America.

[33]  Shuangge Ma,et al.  Censored Rank Independence Screening for High-dimensional Survival Data. , 2014, Biometrika.