Martingale Difference Correlation and Its Use in High-Dimensional Variable Screening

In this article, we propose a new metric, the so-called martingale difference correlation, to measure the departure of conditional mean independence between a scalar response variable V and a vector predictor variable U. Our metric is a natural extension of distance correlation proposed by Székely, Rizzo, and Bahirov, which is used to measure the dependence between V and U. The martingale difference correlation and its empirical counterpart inherit a number of desirable features of distance correlation and sample distance correlation, such as algebraic simplicity and elegant theoretical properties. We further use martingale difference correlation as a marginal utility to do high-dimensional variable screening to screen out variables that do not contribute to conditional mean of the response given the covariates. Further extension to conditional quantile screening is also described in detail and sure screening properties are rigorously justified. Both simulation results and real data illustrations demonstrate the effectiveness of martingale difference correlation-based screening procedures in comparison with the existing counterparts. Supplementary materials for this article are available online.

[1]  Yang Feng,et al.  High-dimensional variable selection for Cox's proportional hazards model , 2010, 1002.3315.

[2]  Yang Feng,et al.  Nonparametric Independence Screening in Sparse Ultra-High-Dimensional Additive Models , 2009, Journal of the American Statistical Association.

[3]  Runze Li,et al.  Feature Screening via Distance Correlation Learning , 2012, Journal of the American Statistical Association.

[4]  Thomas L Casavant,et al.  Homozygosity mapping with SNP arrays identifies TRIM32, an E3 ubiquitin ligase, as a Bardet-Biedl syndrome gene (BBS11). , 2006, Proceedings of the National Academy of Sciences of the United States of America.

[5]  J. Horowitz,et al.  VARIABLE SELECTION IN NONPARAMETRIC ADDITIVE MODELS. , 2010, Annals of statistics.

[6]  W Y Zhang,et al.  Discussion on `Sure independence screening for ultra-high dimensional feature space' by Fan, J and Lv, J. , 2008 .

[7]  Peter Hall,et al.  Using Generalized Correlation to Effect Variable Selection in Very High Dimensional Problems , 2009 .

[8]  K. Vranizan,et al.  Conditional expression of a Gi-coupled receptor causes ventricular conduction delay and a lethal cardiomyopathy. , 2000, Proceedings of the National Academy of Sciences of the United States of America.

[9]  Jianqing Fan,et al.  Sure independence screening in generalized linear models with NP-dimensionality , 2009, The Annals of Statistics.

[10]  Yi Li,et al.  Principled sure independence screening for Cox models with ultra-high-dimensional covariates , 2012, J. Multivar. Anal..

[11]  Jianqing Fan,et al.  Sure independence screening for ultrahigh dimensional feature space , 2006, math/0612857.

[12]  Runze Li,et al.  Quantile Regression for Analyzing Heterogeneity in Ultra-High Dimension , 2012, Journal of the American Statistical Association.

[13]  Thomas H. Scheike,et al.  Independent screening for single‐index hazard rate models with ultrahigh dimensional features , 2011, 1105.3361.

[14]  Runze Li,et al.  Model-Free Feature Screening for Ultrahigh-Dimensional Data , 2011, Journal of the American Statistical Association.

[15]  Jun Zhang,et al.  Robust rank correlation based screening , 2010, 1012.4255.

[16]  V. Sheffield,et al.  Regulation of gene expression in the mammalian eye and its relevance to eye disease , 2006, Proceedings of the National Academy of Sciences.

[17]  Bing Li,et al.  Dimension reduction for the conditional mean in regressions with categorical predictors , 2003 .

[18]  Aurore Delaigle,et al.  EFFECT OF HEAVY TAILS ON ULTRA HIGH DIMENSIONAL VARIABLE RANKING METHODS , 2012 .

[19]  Maria L. Rizzo,et al.  Measuring and testing dependence by correlation of distances , 2007, 0803.4101.

[20]  S. Geer,et al.  High-dimensional additive modeling , 2008, 0806.4115.

[21]  Lan Wang,et al.  Quantile-adaptive model-free variable screening for high-dimensional heterogeneous data , 2013, 1304.2186.

[22]  R. Cook,et al.  Dimension reduction for conditional mean in regression , 2002 .

[23]  Maria L. Rizzo,et al.  Brownian distance covariance , 2009, 1010.0297.

[24]  Yichao Wu,et al.  Ultrahigh Dimensional Feature Selection: Beyond The Linear Model , 2009, J. Mach. Learn. Res..