Estimating Feature-Label Dependence Using Gini Distance Statistics

Identifying statistical dependence between the features and the label is a fundamental problem in supervised learning. This paper presents a framework for estimating dependence between numerical features and a categorical label using generalized Gini distance, an energy distance in reproducing kernel Hilbert spaces (RKHS). Two Gini distance based dependence measures are explored: Gini distance covariance and Gini distance correlation. Unlike Pearson covariance and correlation, which do not characterize independence, the above Gini distance based measures define dependence as well as independence of random variables. The test statistics are simple to calculate and do not require probability density estimation. Uniform convergence bounds and asymptotic bounds are derived for the test statistics. Comparisons with distance covariance statistics are provided. It is shown that Gini distance statistics converge faster than distance covariance statistics in the uniform convergence bounds, hence tighter upper bounds on both Type I and Type II errors. Moreover, the probability of Gini distance covariance statistic under-performing the distance covariance statistic in Type II error decreases to 0 exponentially with the increase of the sample size. Extensive experimental results are presented to demonstrate the performance of the proposed method.

[1]  Runze Li,et al.  Model-Free Feature Screening for Ultrahigh Dimensional Discriminant Analysis , 2015, Journal of the American Statistical Association.

[2]  Ing Rj Ser Approximation Theorems of Mathematical Statistics , 1980 .

[3]  Kashif Javed,et al.  Feature Selection Based on Class-Dependent Densities for High-Dimensional Binary Data , 2012, IEEE Transactions on Knowledge and Data Engineering.

[4]  Huan Liu,et al.  Efficient Feature Selection via Analysis of Relevance and Redundancy , 2004, J. Mach. Learn. Res..

[5]  Stephen A. Billings,et al.  Feature Subset Selection and Ranking for Data Dimensionality Reduction , 2007, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[6]  Ivor W. Tsang,et al.  Making Trillion Correlations Feasible in Feature Grouping and Selection , 2016, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[7]  Hujun Bao,et al.  A Variance Minimization Criterion to Feature Selection Using Laplacian Regularization , 2011, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[8]  Jordi Vitrià,et al.  On the Selection and Classification of Independent Features , 2003, IEEE Trans. Pattern Anal. Mach. Intell..

[9]  Ron Kohavi,et al.  Wrappers for Feature Subset Selection , 1997, Artif. Intell..

[10]  Bernhard Schölkopf,et al.  A Kernel Two-Sample Test , 2012, J. Mach. Learn. Res..

[11]  Geoffrey E. Hinton,et al.  Stochastic Neighbor Embedding , 2002, NIPS.

[12]  Tieniu Tan,et al.  Joint Feature Selection and Subspace Learning for Cross-Modal Retrieval , 2016, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[13]  Lei Wang,et al.  Feature Selection with Kernel Class Separability , 2008, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[14]  Chong-Ho Choi,et al.  Input Feature Selection by Mutual Information Based on Parzen Window , 2002, IEEE Trans. Pattern Anal. Mach. Intell..

[15]  Peter J. Ramadge,et al.  Screening Tests for Lasso Problems , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[16]  Birger Hjørland,et al.  The foundation of the concept of relevance , 2010, J. Assoc. Inf. Sci. Technol..

[17]  Gérard Dreyfus,et al.  Ranking a Random Feature for Variable and Feature Selection , 2003, J. Mach. Learn. Res..

[18]  Sankar K. Pal,et al.  Feature Selection Using f-Information Measures in Fuzzy Approximation Spaces , 2010, IEEE Transactions on Knowledge and Data Engineering.

[19]  Beat Pfister,et al.  A Semidefinite Programming Based Search Strategy for Feature Selection with Mutual Information Measure , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[20]  Mario Marchand,et al.  Feature Selection with Conjunctions of Decision Stumps and Learning from Microarray Data , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[21]  Adrian Barbu,et al.  Feature Selection with Annealing for Computer Vision and Big Data Learning , 2013, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[22]  Maria L. Rizzo,et al.  Brownian distance covariance , 2009, 1010.0297.

[23]  Tieniu Tan,et al.  Feature Selection Based on Structured Sparsity: A Comprehensive Study , 2017, IEEE Transactions on Neural Networks and Learning Systems.

[24]  George C. Runger,et al.  Feature Selection with Ensembles, Artificial Variables, and Redundancy Elimination , 2009, J. Mach. Learn. Res..

[25]  Maria L. Rizzo,et al.  TESTING FOR EQUAL DISTRIBUTIONS IN HIGH DIMENSION , 2004 .

[26]  H. Sebastian Seung,et al.  Learning the parts of objects by non-negative matrix factorization , 1999, Nature.

[27]  Pierre Comon,et al.  Independent component analysis, A new concept? , 1994, Signal Process..

[28]  Shuang-Hong Yang,et al.  Ieee Transactions on Knowledge and Data Engineering, Vol. X, No. X Discriminative Feature Selection by Nonparametric Bayes Error Minimization , 2022 .

[29]  Yichao Wu,et al.  Ultrahigh Dimensional Feature Selection: Beyond The Linear Model , 2009, J. Mach. Learn. Res..

[30]  Maria L. Rizzo,et al.  Partial Distance Correlation with Methods for Dissimilarities , 2013, 1310.2926.

[31]  Carlos Tenreiro,et al.  A new test for multivariate normality by combining extreme and nonextreme BHEP tests , 2017, Commun. Stat. Simul. Comput..

[32]  Pat Langley,et al.  Selection of Relevant Features and Examples in Machine Learning , 1997, Artif. Intell..

[33]  C. A. Murthy,et al.  Unsupervised Feature Selection Using Feature Similarity , 2002, IEEE Trans. Pattern Anal. Mach. Intell..

[34]  Maria L. Rizzo,et al.  Energy statistics: A class of statistics based on distances , 2013 .

[35]  Jianqing Fan,et al.  Sure independence screening for ultrahigh dimensional feature space , 2006, math/0612857.

[36]  S. Canu,et al.  M L ] 6 O ct 2 00 9 Functional learning through kernel , 2009 .

[37]  Yi Li,et al.  A Robust-Equitable Measure for Feature Ranking and Selection , 2017, J. Mach. Learn. Res..

[38]  Junwei Han,et al.  LLE Score: A New Filter-Based Unsupervised Feature Selection Method Based on Nonlinear Manifold Embedding and Its Application to Image Recognition. , 2017, IEEE transactions on image processing : a publication of the IEEE Signal Processing Society.

[39]  Le Song,et al.  A Hilbert Space Embedding for Distributions , 2007, Discovery Science.

[40]  Kenji Fukumizu,et al.  Equivalence of distance-based and RKHS-based statistics in hypothesis testing , 2012, ArXiv.

[41]  Jesper Tegnér,et al.  Consistent Feature Selection for Pattern Recognition in Polynomial Time , 2007, J. Mach. Learn. Res..

[42]  N. Aronszajn Theory of Reproducing Kernels. , 1950 .

[43]  Josef Kittler,et al.  Fast branch & bound algorithms for optimal feature selection , 2004, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[44]  Isabelle Guyon,et al.  An Introduction to Variable and Feature Selection , 2003, J. Mach. Learn. Res..

[45]  David A. Bell,et al.  Axiomatic Approach to Feature Subset Selection Based on Relevance , 1999, IEEE Trans. Pattern Anal. Mach. Intell..

[46]  Ann. Probab Distance Covariance in Metric Spaces , 2017 .

[47]  Antonio Cuevas,et al.  Variable selection in functional data classification: a maxima-hunting proposal , 2013, 1309.6697.

[48]  Paul A. Rubin,et al.  Feature Selection for Multiclass Discrimination via Mixed-Integer Linear Programming , 2003, IEEE Trans. Pattern Anal. Mach. Intell..

[49]  Ron Kohavi,et al.  Irrelevant Features and the Subset Selection Problem , 1994, ICML.

[50]  Xiaoming Huo,et al.  Fast Computing for Distance Covariance , 2014, Technometrics.

[51]  Maria L. Rizzo,et al.  A new test for multivariate normality , 2005 .

[52]  Sinisa Todorovic,et al.  Local-Learning-Based Feature Selection for High-Dimensional Data Analysis , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[53]  Rong Jin,et al.  Online Feature Selection and Its Applications , 2014, IEEE Transactions on Knowledge and Data Engineering.

[54]  François Fleuret,et al.  Jointly Informative Feature Selection Made Tractable by Gaussian Modeling , 2016, J. Mach. Learn. Res..

[55]  J. Mercer Functions of Positive and Negative Type, and their Connection with the Theory of Integral Equations , 1909 .

[56]  Aristidis Likas,et al.  Bayesian feature and model selection for Gaussian mixture models , 2006, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[57]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[58]  Maria L. Rizzo,et al.  Measuring and testing dependence by correlation of distances , 2007, 0803.4101.

[59]  Jeanny Hérault,et al.  Curvilinear component analysis: a self-organizing neural network for nonlinear mapping of data sets , 1997, IEEE Trans. Neural Networks.

[60]  Yiu-ming Cheung,et al.  Feature Selection and Kernel Learning for Local Learning-Based Clustering , 2011, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[61]  Fuhui Long,et al.  Feature selection based on mutual information criteria of max-dependency, max-relevance, and min-redundancy , 2003, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[62]  Johan A. K. Suykens,et al.  Advances in learning theory : methods, models and applications , 2003 .

[63]  Josef Kittler,et al.  Divergence Based Feature Selection for Multimodal Class Densities , 1996, IEEE Trans. Pattern Anal. Mach. Intell..

[64]  Nicolas Courty,et al.  Sparse Hilbert Schmidt Independence Criterion and Surrogate-Kernel-Based Feature Selection for Hyperspectral Image Classification , 2017, IEEE Transactions on Geoscience and Remote Sensing.

[65]  C. Gini,et al.  On the measurement of concentration and variability of characters , 2005 .

[66]  Mikhail Belkin,et al.  Laplacian Eigenmaps and Spectral Techniques for Embedding and Clustering , 2001, NIPS.

[67]  Ivor W. Tsang,et al.  A Feature Selection Method for Multivariate Performance Measures , 2011, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[68]  Huan Liu,et al.  Toward integrating feature selection algorithms for classification and clustering , 2005, IEEE Transactions on Knowledge and Data Engineering.

[69]  K. Pearson VII. Note on regression and inheritance in the case of two parents , 1895, Proceedings of the Royal Society of London.

[70]  Tommy W. S. Chow,et al.  Estimating optimal feature subsets using efficient estimation of high-dimensional mutual information , 2005, IEEE Transactions on Neural Networks.

[71]  J. Tenenbaum,et al.  A global geometric framework for nonlinear dimensionality reduction. , 2000, Science.

[72]  Deniz Erdogmus,et al.  Feature selection in MLPs and SVMs based on maximum output information , 2004, IEEE Transactions on Neural Networks.

[73]  Hao Wang,et al.  Online Feature Selection with Streaming Features , 2013, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[74]  Roberto Battiti,et al.  Using mutual information for selecting features in supervised neural net learning , 1994, IEEE Trans. Neural Networks.

[75]  Feng Chu,et al.  A General Wrapper Approach to Selection of Class-Dependent Features , 2008, IEEE Transactions on Neural Networks.

[76]  Qiang Cheng,et al.  The Fisher-Markov Selector: Fast Selecting Maximally Separable Feature Subset for Multiclass Classification with Applications to High-Dimensional Data , 2011, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[77]  Jianzhong Wang,et al.  Locally Linear Embedding , 2021, Unsupervised Learning Approaches for Dimensionality Reduction and Data Visualization.

[78]  Lei Wang,et al.  On Similarity Preserving Feature Selection , 2013, IEEE Transactions on Knowledge and Data Engineering.

[79]  Runze Li,et al.  Feature Screening via Distance Correlation Learning , 2012, Journal of the American Statistical Association.

[80]  A. Nobel,et al.  Supervised risk predictor of breast cancer based on intrinsic subtypes. , 2009, Journal of clinical oncology : official journal of the American Society of Clinical Oncology.

[81]  Majid Komeili,et al.  Local Feature Selection for Data Classification , 2016, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[82]  Chong-Ho Choi,et al.  Input feature selection for classification problems , 2002, IEEE Trans. Neural Networks.

[83]  Khalid Benabdeslem,et al.  Efficient Semi-Supervised Feature Selection: Constraint, Relevance, and Redundancy , 2014, IEEE Transactions on Knowledge and Data Engineering.

[84]  Le Song,et al.  Feature Selection via Dependence Maximization , 2012, J. Mach. Learn. Res..

[85]  Lei Wang,et al.  Feature Selection With Redundancy-Constrained Class Separability , 2010, IEEE Transactions on Neural Networks.

[86]  Xiaoning Qian,et al.  Safe Feature Screening for Generalized LASSO , 2018, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[87]  Ya-Feng Liu,et al.  LLE Score: A New Filter-Based Unsupervised Feature Selection Method Based on Nonlinear Manifold Embedding and Its Application to Image Recognition , 2017, IEEE Transactions on Image Processing.

[88]  L. Baringhaus,et al.  On a new multivariate two-sample test , 2004 .

[89]  W. Torgerson Multidimensional scaling: I. Theory and method , 1952 .

[90]  Lawrence Carin,et al.  A Bayesian approach to joint feature selection and classifier design , 2004, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[91]  H. Hotelling Analysis of a complex of statistical variables into principal components. , 1933 .

[92]  Nikhil R. Pal,et al.  Feature Selection Using a Neural Framework With Controlled Redundancy , 2015, IEEE Transactions on Neural Networks and Learning Systems.

[93]  Mary Goldman,et al.  The UCSC Xena platform for public and private cancer genomics data visualization and interpretation , 2018, bioRxiv.

[94]  W. Hoeffding A Class of Statistics with Asymptotically Normal Distribution , 1948 .

[95]  Junying Zhang,et al.  A new Gini correlation between quantitative and qualitative variables , 2018, Scandinavian Journal of Statistics.