Kernel Feature Selection via Conditional Covariance Minimization

We propose a method for feature selection that employs kernel-based measures of independence to find a subset of covariates that is maximally predictive of the response. Building on past work in kernel dimension reduction, we show how to perform feature selection via a constrained optimization problem involving the trace of the conditional covariance operator. We prove various consistency results for this procedure, and also demonstrate that our method compares favorably with other state-of-the-art algorithms on a variety of synthetic and real data sets.

[1]  Bernhard Schölkopf,et al.  Measuring Statistical Dependence with Hilbert-Schmidt Norms , 2005, ALT.

[2]  Xiaoning Qian,et al.  A Scalable Algorithm for Structured Kernel Feature Selection , 2015, AISTATS.

[3]  Benjamin Recht,et al.  Random Features for Large-Scale Kernel Machines , 2007, NIPS.

[4]  Bernhard Schölkopf,et al.  Use of the Zero-Norm with Linear Models and Kernel Methods , 2003, J. Mach. Learn. Res..

[5]  Masashi Sugiyama,et al.  High-Dimensional Feature Selection by Feature-Wise Kernelized Lasso , 2012, Neural Computation.

[6]  Aixia Guo,et al.  Gene Selection for Cancer Classification using Support Vector Machines , 2014 .

[7]  Naftali Tishby,et al.  Margin based feature selection - theory and algorithms , 2004, ICML.

[8]  Jennifer G. Dy,et al.  From Transformation-Based Dimensionality Reduction to Feature Selection , 2010, ICML.

[9]  Yves Grandvalet,et al.  Adaptive Scaling for Feature Selection in SVMs , 2002, NIPS.

[10]  Fuhui Long,et al.  Feature selection based on mutual information criteria of max-dependency, max-relevance, and min-redundancy , 2003, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[11]  Michael I. Jordan,et al.  Kernel dimension reduction in regression , 2009, 0908.1854.

[12]  Genevera I. Allen Automatic Feature Selection via Weighted Kernels and Regularization , 2013 .

[13]  Le Song,et al.  Supervised feature selection via dependence estimation , 2007, ICML '07.

[14]  Ron Kohavi,et al.  Wrappers for Feature Subset Selection , 1997, Artif. Intell..

[15]  Yoram Singer,et al.  Efficient projections onto the l1-ball for learning in high dimensions , 2008, ICML '08.

[16]  C. Baker Joint measures and cross-covariance operators , 1973 .

[17]  Le Song,et al.  Feature Selection via Dependence Maximization , 2012, J. Mach. Learn. Res..

[18]  P. Jaganathan,et al.  A Kernel Based Feature Selection Method Used in the Diagnosis of Wisconsin Breast Cancer Dataset , 2011, ACC.

[19]  Qiang Yang,et al.  Feature selection in a kernel space , 2007, ICML '07.

[20]  Verónica Bolón-Canedo,et al.  Recent advances and emerging challenges of feature selection in the context of big data , 2015, Knowl. Based Syst..

[21]  Kenji Fukumizu,et al.  Gradient-based kernel method for feature extraction and variable selection , 2012, NIPS.

[22]  Isabelle Guyon,et al.  An Introduction to Variable and Feature Selection , 2003, J. Mach. Learn. Res..

[23]  Michael I. Jordan,et al.  Dimensionality Reduction for Supervised Learning with Reproducing Kernel Hilbert Spaces , 2004, J. Mach. Learn. Res..

[24]  Shiquan Sun,et al.  A Kernel-Based Multivariate Feature Selection Method for Microarray Data Classification , 2014, PloS one.

[25]  Paul S. Bradley,et al.  Feature Selection via Concave Minimization and Support Vector Machines , 1998, ICML.

[26]  Gaël Varoquaux,et al.  Scikit-learn: Machine Learning in Python , 2011, J. Mach. Learn. Res..

[27]  Trevor Hastie,et al.  The elements of statistical learning. 2001 , 2001 .

[28]  Sayan Mukherjee,et al.  Feature Selection for SVMs , 2000, NIPS.