论文信息 - Nonlinear Feature Selection with the Potential Support Vector Machine - 字舞流文

Nonlinear Feature Selection with the Potential Support Vector Machine

We describe the “Potential Support Vector Machine” (P-SVM) which is a new filter method for feature selection. The idea of the P-SVM feature selection is to exchange the role of features and data points in order to construct “support features”. The “support features” are the selected features. The P-SVM uses a novel objective function and novel constraints — one constraint for each feature. As with standard SVMs, the objective function represents a complexity or capacity measure whereas the constraints enforce low empirical error. In this contribution we extend the P-SVM in two directions. First, we introduce a parameter which controls the redundancy among the selected features. Secondly, we propose a nonlinear version of the P-SVM feature selection which is based on neural network techniques. Finally, the linear and nonlinear P-SVM feature selection approach is demonstrated on toy data sets and on data sets from the NIPS 2003 feature selection challenge.

Klaus Obermayer | Sepp Hochreiter | S. Hochreiter | K. Obermayer

[1] Maurice G. Kendall,et al. The advanced theory of statistics , 1945 .

[2] R. Bellman,et al. V. Adaptive Control Processes , 1964 .

[3] Yann LeCun,et al. Optimal Brain Damage , 1989, NIPS.

[4] Toshio Odanaka,et al. ADAPTIVE CONTROL PROCESSES , 1990 .

[5] Michael I. Jordan,et al. Advances in Neural Information Processing Systems 30 , 1995 .

[6] John E. Moody,et al. Principled Architecture Selection for Neural Networks: Application to Corporate Bond Rating Prediction , 1991, NIPS.

[7] M. Kramer. Nonlinear principal component analysis using autoassociative neural networks , 1991 .

[8] Bernhard E. Boser,et al. A training algorithm for optimal margin classifiers , 1992, COLT '92.

[9] Babak Hassibi,et al. Second Order Derivatives for Network Pruning: Optimal Brain Surgeon , 1992, NIPS.

[10] Geoffrey E. Hinton,et al. Bayesian Learning for Neural Networks , 1995 .

[11] R. Tibshirani. Regression Shrinkage and Selection via the Lasso , 1996 .

[12] Bernhard Schölkopf,et al. Kernel Principal Component Analysis , 1997, ICANN.

[13] Jürgen Schmidhuber,et al. Flat Minima , 1997, Neural Computation.

[14] Paul S. Bradley,et al. Feature Selection via Concave Minimization and Support Vector Machines , 1998, ICML.

[15] Vladimir Vapnik,et al. Statistical learning theory , 1998 .

[16] Paul S. Bradley,et al. Feature Selection via Mathematical Programming , 1997, INFORMS J. Comput..

[17] John C. Platt,et al. Fast training of support vector machines using sequential minimal optimization, advances in kernel methods , 1999 .

[18] Sayan Mukherjee,et al. Feature Selection for SVMs , 2000, NIPS.

[19] Naftali Tishby,et al. The information bottleneck method , 2000, ArXiv.

[20] Alexander J. Smola,et al. Learning with Kernels: support vector machines, regularization, optimization, and beyond , 2001, Adaptive computation and machine learning series.

[21] R. C. Williamson,et al. Regularized principal manifolds , 2001 .

[22] Bernhard Schölkopf,et al. Use of the Zero-Norm with Linear Models and Kernel Methods , 2003, J. Mach. Learn. Res..

[23] Anthony Widjaja,et al. Learning with Kernels: Support Vector Machines, Regularization, Optimization, and Beyond , 2003, IEEE Transactions on Neural Networks.

[24] Jinbo Bi,et al. Dimensionality Reduction via Sparse Support Vector Machines , 2003, J. Mach. Learn. Res..

[25] Corinna Cortes,et al. Support-Vector Networks , 1995, Machine Learning.

[26] B. Yegnanarayana,et al. Artificial Neural Networks , 2004 .

[27] S. Hochreiter,et al. Classification, Regression, and Feature Selection on Matrix Data , 2004 .

[28] Klaus Obermayer,et al. Gene Selection for Microarray Data , 2004 .

[29] Jason Weston,et al. Gene Selection for Cancer Classification using Support Vector Machines , 2002, Machine Learning.

[30] James V. Candy,et al. Adaptive and Learning Systems for Signal Processing, Communications, and Control , 2006 .