Ensemble-based Ultrahigh-dimensional Variable Screening

Since the sure independence screening (SIS) method by Fan and Lv [2008], many different variable screening methods have been proposed based on different measures under different models. However, most of these methods are designed for specific models. In practice, we often have very little information about the data generating process and different methods can result in very different sets of features. The heterogeneity presented here motivates us to combine various screening methods simultaneously. In this paper, we introduce a general ensemble-based framework to efficiently combine results from multiple variable screening methods. The consistency and sure screening property of proposed framework has been established. Extensive simulation studies confirm our intuition that the proposed ensemble-based method is more robust against model specification than using single variable screening method. The proposed ensemble-based method is used to predict attention deficit hyperactivity disorder (ADHD) status using brain function connectivity (FC).

[1]  Jianqing Fan,et al.  Variable Selection via Nonconcave Penalized Likelihood and its Oracle Properties , 2001 .

[2]  H. Zou The Adaptive Lasso and Its Oracle Properties , 2006 .

[3]  G. Glover,et al.  Resting-State Functional Connectivity in Major Depression: Abnormally Increased Contributions from Subgenual Cingulate Cortex and Thalamus , 2007, Biological Psychiatry.

[4]  Jun Zhang,et al.  Robust rank correlation based screening , 2010, 1012.4255.

[5]  B. Biswal,et al.  Network homogeneity reveals decreased integrity of default-mode network in ADHD , 2008, Journal of Neuroscience Methods.

[6]  S. Crawford,et al.  Volume 1 , 2012, Journal of Diabetes Investigation.

[7]  P. Cochat,et al.  Et al , 2008, Archives de pediatrie : organe officiel de la Societe francaise de pediatrie.

[8]  Schmidt Dieter,et al.  SCIENCE CHINA Mathematics , 2011 .

[9]  Lan Wang,et al.  Quantile-adaptive model-free variable screening for high-dimensional heterogeneous data , 2013, 1304.2186.

[10]  Runze Li,et al.  Model-Free Feature Screening for Ultrahigh-Dimensional Data , 2011, Journal of the American Statistical Association.

[11]  M. Milham,et al.  The ADHD-200 Consortium: A Model to Advance the Translational Potential of Neuroimaging in Clinical Neuroscience , 2012, Front. Syst. Neurosci..

[12]  Francis R. Bach,et al.  Bolasso: model consistent Lasso estimation through the bootstrap , 2008, ICML '08.

[13]  Yang Feng,et al.  Nonparametric Independence Screening in Sparse Ultra-High-Dimensional Additive Models , 2009, Journal of the American Statistical Association.

[14]  Cun-Hui Zhang Nearly unbiased variable selection under minimax concave penalty , 2010, 1002.4734.

[15]  Jianqing Fan,et al.  Sure independence screening in generalized linear models with NP-dimensionality , 2009, The Annals of Statistics.

[16]  Jianqing Fan,et al.  Sure independence screening for ultrahigh dimensional feature space , 2006, math/0612857.

[17]  N. Tzourio-Mazoyer,et al.  Automated Anatomical Labeling of Activations in SPM Using a Macroscopic Anatomical Parcellation of the MNI MRI Single-Subject Brain , 2002, NeuroImage.

[18]  Tianzi Jiang,et al.  Altered resting-state functional connectivity patterns of anterior cingulate cortex in adolescents with attention deficit hyperactivity disorder , 2006, Neuroscience Letters.

[19]  Runze Li,et al.  Feature Screening via Distance Correlation Learning , 2012, Journal of the American Statistical Association.

[20]  Verónica Bolón-Canedo,et al.  Ensembles for feature selection: A review and future trends , 2019, Inf. Fusion.

[21]  Witold Pedrycz,et al.  Aggregation of Classifiers: A Justifiable Information Granularity Approach , 2017, IEEE Transactions on Cybernetics.

[22]  Yichao Wu,et al.  Ultrahigh Dimensional Feature Selection: Beyond The Linear Model , 2009, J. Mach. Learn. Res..

[23]  R. Fildes Journal of the American Statistical Association : William S. Cleveland, Marylyn E. McGill and Robert McGill, The shape parameter for a two variable graph 83 (1988) 289-300 , 1989 .

[24]  Liu Jingyuan,et al.  A selective overview of feature screening for ultrahigh-dimensional data , 2015, Science China Mathematics.

[25]  Dirk Van,et al.  Ensemble Methods: Foundations and Algorithms , 2012 .