A Robust-Equitable Copula Dependence Measure for Feature Selection

Feature selection aims to select relevant features to improve the performance of predictors. Many feature selection methods depend on the choice of dependence measures. To select features that have complex nonlinear relationships with the response variable, the dependence measure should be equitable; i.e., it should treat linear and nonlinear relationships equally. In this paper, we introduce the concept of robust-equitability and identify a robust-equitable dependence measure robust copula dependence (RCD). This measure has the following advantages compared to existing dependence measures: it is robust to different relationship forms and robust to unequal sample sizes of different features. In contrast, existing dependence measures cannot take these factors into account simultaneously. Experiments on synthetic and realworld datasets confirm our theoretical analysis, and illustrate its advantage in feature selection.

[1]  Jason Weston,et al.  Gene Selection for Cancer Classification using Support Vector Machines , 2002, Machine Learning.

[2]  Deng Cai,et al.  Laplacian Score for Feature Selection , 2005, NIPS.

[3]  Masashi Sugiyama,et al.  Mutual information approximation via maximum likelihood estimation of density ratio , 2009, 2009 IEEE International Symposium on Information Theory.

[4]  J. Yackel,et al.  Consistency Properties of Nearest Neighbor Density Function Estimators , 1977 .

[5]  A. Asuncion,et al.  UCI Machine Learning Repository, University of California, Irvine, School of Information and Computer Sciences , 2007 .

[6]  Huan Liu,et al.  Efficient Feature Selection via Analysis of Relevance and Redundancy , 2004, J. Mach. Learn. Res..

[7]  S. Silvey On a Measure of Association , 1964 .

[8]  Barnabás Póczos,et al.  Copula-based Kernel Dependency Measures , 2012, ICML.

[9]  J. Yackel,et al.  Large Sample Properties of Nearest Neighbor Density Function Estimators , 1977 .

[10]  Moon,et al.  Estimation of mutual information using kernel density estimators. , 1995, Physical review. E, Statistical physics, plasmas, fluids, and related interdisciplinary topics.

[11]  J. Kinney,et al.  Equitability, mutual information, and the maximal information coefficient , 2013, Proceedings of the National Academy of Sciences.

[12]  Jane Labadin,et al.  Feature selection based on mutual information , 2015, 2015 9th International Conference on IT in Asia (CITA).

[13]  M. Rosenblatt,et al.  Multivariate k-nearest neighbor density estimates , 1979 .

[14]  C. Quesenberry,et al.  A nonparametric estimate of a multivariate density function , 1965 .

[15]  Carla E. Brodley,et al.  Feature Selection for Unsupervised Learning , 2004, J. Mach. Learn. Res..

[16]  R. Nelsen An Introduction to Copulas (Springer Series in Statistics) , 2006 .

[17]  J. Freidman,et al.  Multivariate adaptive regression splines , 1991 .

[18]  Isabelle Guyon,et al.  An Introduction to Variable and Feature Selection , 2003, J. Mach. Learn. Res..

[19]  Barnabás Póczos,et al.  Scale Invariant Conditional Dependence Measures , 2013, ICML.

[20]  Kenji Fukumizu,et al.  Statistical Consistency of Kernel Canonical Correlation Analysis , 2007 .

[21]  Ron Kohavi,et al.  Wrappers for Feature Subset Selection , 1997, Artif. Intell..

[22]  Bernhard Schölkopf,et al.  Kernel Measures of Conditional Dependence , 2007, NIPS.

[23]  Larry A. Rendell,et al.  The Feature Selection Problem: Traditional Methods and a New Algorithm , 1992, AAAI.

[24]  Gregory B. Gloor,et al.  Mutual information is critically dependent on prior assumptions: would the correct estimate of mutual information please identify itself? , 2010, Bioinform..

[25]  Bernhard Schölkopf,et al.  Measuring Statistical Dependence with Hilbert-Schmidt Norms , 2005, ALT.

[26]  A. Kraskov,et al.  Estimating mutual information. , 2003, Physical review. E, Statistical, nonlinear, and soft matter physics.

[27]  L. Lecam Convergence of Estimates Under Dimensionality Restrictions , 1973 .

[28]  Michael Mitzenmacher,et al.  Detecting Novel Associations in Large Data Sets , 2011, Science.

[29]  Le Song,et al.  Supervised feature selection via dependence estimation , 2007, ICML '07.

[30]  L. Devroye,et al.  A weighted k-nearest neighbor density estimate for geometric inference , 2011 .

[31]  Josef Kittler,et al.  Floating search methods in feature selection , 1994, Pattern Recognit. Lett..

[32]  R. Tibshirani Regression Shrinkage and Selection via the Lasso , 1996 .

[33]  D. Donoho,et al.  Geometrizing Rates of Convergence , II , 2008 .