Nonlinear Feature Selection with the Potential Support Vector Machine

We describe the “Potential Support Vector Machine” (P-SVM) which is a new filter method for feature selection. The idea of the P-SVM feature selection is to exchange the role of features and data points in order to construct “support features”. The “support features” are the selected features. The P-SVM uses a novel objective function and novel constraints — one constraint for each feature. As with standard SVMs, the objective function represents a complexity or capacity measure whereas the constraints enforce low empirical error. In this contribution we extend the P-SVM in two directions. First, we introduce a parameter which controls the redundancy among the selected features. Secondly, we propose a nonlinear version of the P-SVM feature selection which is based on neural network techniques. Finally, the linear and nonlinear P-SVM feature selection approach is demonstrated on toy data sets and on data sets from the NIPS 2003 feature selection challenge.

[1]  Maurice G. Kendall,et al.  The advanced theory of statistics , 1945 .

[2]  R. Bellman,et al.  V. Adaptive Control Processes , 1964 .

[3]  Yann LeCun,et al.  Optimal Brain Damage , 1989, NIPS.

[4]  Toshio Odanaka,et al.  ADAPTIVE CONTROL PROCESSES , 1990 .

[5]  Michael I. Jordan,et al.  Advances in Neural Information Processing Systems 30 , 1995 .

[6]  John E. Moody,et al.  Principled Architecture Selection for Neural Networks: Application to Corporate Bond Rating Prediction , 1991, NIPS.

[7]  M. Kramer Nonlinear principal component analysis using autoassociative neural networks , 1991 .

[8]  Bernhard E. Boser,et al.  A training algorithm for optimal margin classifiers , 1992, COLT '92.

[9]  Babak Hassibi,et al.  Second Order Derivatives for Network Pruning: Optimal Brain Surgeon , 1992, NIPS.

[10]  Geoffrey E. Hinton,et al.  Bayesian Learning for Neural Networks , 1995 .

[11]  R. Tibshirani Regression Shrinkage and Selection via the Lasso , 1996 .

[12]  Bernhard Schölkopf,et al.  Kernel Principal Component Analysis , 1997, ICANN.

[13]  Jürgen Schmidhuber,et al.  Flat Minima , 1997, Neural Computation.

[14]  Paul S. Bradley,et al.  Feature Selection via Concave Minimization and Support Vector Machines , 1998, ICML.

[15]  Vladimir Vapnik,et al.  Statistical learning theory , 1998 .

[16]  Paul S. Bradley,et al.  Feature Selection via Mathematical Programming , 1997, INFORMS J. Comput..

[17]  John C. Platt,et al.  Fast training of support vector machines using sequential minimal optimization, advances in kernel methods , 1999 .

[18]  Sayan Mukherjee,et al.  Feature Selection for SVMs , 2000, NIPS.

[19]  Naftali Tishby,et al.  The information bottleneck method , 2000, ArXiv.

[20]  Alexander J. Smola,et al.  Learning with Kernels: support vector machines, regularization, optimization, and beyond , 2001, Adaptive computation and machine learning series.

[21]  R. C. Williamson,et al.  Regularized principal manifolds , 2001 .

[22]  Bernhard Schölkopf,et al.  Use of the Zero-Norm with Linear Models and Kernel Methods , 2003, J. Mach. Learn. Res..

[23]  Anthony Widjaja,et al.  Learning with Kernels: Support Vector Machines, Regularization, Optimization, and Beyond , 2003, IEEE Transactions on Neural Networks.

[24]  Jinbo Bi,et al.  Dimensionality Reduction via Sparse Support Vector Machines , 2003, J. Mach. Learn. Res..

[25]  Corinna Cortes,et al.  Support-Vector Networks , 1995, Machine Learning.

[26]  B. Yegnanarayana,et al.  Artificial Neural Networks , 2004 .

[27]  S. Hochreiter,et al.  Classification, Regression, and Feature Selection on Matrix Data , 2004 .

[28]  Klaus Obermayer,et al.  Gene Selection for Microarray Data , 2004 .

[29]  Jason Weston,et al.  Gene Selection for Cancer Classification using Support Vector Machines , 2002, Machine Learning.

[30]  James V. Candy,et al.  Adaptive and Learning Systems for Signal Processing, Communications, and Control , 2006 .