Feature Selection With Redundancy-Constrained Class Separability

Scatter-matrix-based class separability is a simple and efficient feature selection criterion in the literature. However, the conventional trace-based formulation does not take feature redundancy into account and is prone to selecting a set of discriminative but mutually redundant features. In this brief, we first theoretically prove that in the context of this trace-based criterion the existence of sufficiently correlated features can always prevent selecting the optimal feature set. Then, on top of this criterion, we propose the redundancy-constrained feature selection (RCFS). To ensure the algorithm's efficiency and scalability, we study the characteristic of the constraints with which the resulted constrained 0-1 optimization can be efficiently and globally solved. By using the totally unimodular (TUM) concept in integer programming, a necessary condition for such constraints is derived. This condition reveals an interesting special case in which qualified redundancy constraints can be conveniently generated via a clustering of features. We study this special case and develop an efficient feature selection approach based on Dinkelbach's algorithm. Experiments on benchmark data sets demonstrate the superior performance of our approach to those without redundancy constraints.

[1]  Werner Dinkelbach On Nonlinear Fractional Programming , 1967 .

[2]  David G. Stork,et al.  Pattern Classification (2nd ed.) , 1999 .

[3]  Tomomi Matsui,et al.  An Analysis of Dinkelbach's Algorithm for 0-1 Fractional Programming Problems , 1992 .

[4]  Nello Cristianini,et al.  An Introduction to Support Vector Machines and Other Kernel-based Learning Methods , 2000 .

[5]  Feiping Nie,et al.  Trace Ratio Criterion for Feature Selection , 2008, AAAI.

[6]  Shigeo Abe DrEng Pattern Classification , 2001, Springer London.

[7]  Anil K. Jain,et al.  Feature Selection: Evaluation, Application, and Small Sample Performance , 1997, IEEE Trans. Pattern Anal. Mach. Intell..

[8]  David K. Smith Theory of Linear and Integer Programming , 1987 .

[9]  Keinosuke Fukunaga,et al.  Introduction to statistical pattern recognition (2nd ed.) , 1990 .

[10]  Kenneth Steiglitz,et al.  Combinatorial Optimization: Algorithms and Complexity , 1981 .

[11]  Lei Wang,et al.  Feature Selection with Kernel Class Separability , 2008, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[12]  Hongdong Li,et al.  Supervised dimensionality reduction via sequential semidefinite programming , 2008, Pattern Recognit..

[13]  Huan Liu,et al.  Toward integrating feature selection algorithms for classification and clustering , 2005, IEEE Transactions on Knowledge and Data Engineering.

[14]  David G. Stork,et al.  Pattern Classification , 1973 .

[15]  Keinosuke Fukunaga,et al.  Introduction to Statistical Pattern Recognition , 1972 .

[16]  Thomas G. Dietterich Approximate Statistical Tests for Comparing Supervised Classification Learning Algorithms , 1998, Neural Computation.