Efficient Feature Subset Selection and Subset Size Optimization

A broad class of decision-making problems can be solved by learning approach. This can be a feasible alternative when neither an analytical solution exists nor the mathematical model can be constructed. In these cases the required knowledge can be gained from the past data which form the so-called learning or training set. Then the formal apparatus of statistical pattern recognition can be used to learn the decision-making. The first and essential step of statistical pattern recognition is to solve the problem of feature selection (FS) ormore generally dimensionality reduction (DR). The problem of feature selection in statistical pattern recognition will be of primary focus in this chapter. The problem fits in the wider context of dimensionality reduction (Section 2) which can be accomplished either by a linear or nonlinear mapping from the measurement space to a lower dimensional feature space, or by measurement subset selection. This chapter will focus on the latter (Section 3). The main aspects of the problem as well as the choice of the right feature selection tools will be discussed (Sections 3.1 to 3.3). Several optimization techniques will be reviewed, with emphasis put to the framework of sequential selection methods (Section 4). Related topics of recent interest will be also addressed, including the problem of subset size determination (Section 4.7), search acceleration through hybrid algorithms (Section 5), and the problem of feature selection stability and feature over-selection (Section 6).

[1]  Susan Craw,et al.  Genetic Algorithms for Feature Selection and Weighting , 1999 .

[2]  Anil K. Jain,et al.  Statistical Pattern Recognition: A Review , 2000, IEEE Trans. Pattern Anal. Mach. Intell..

[3]  Theofanis Sapatinas,et al.  Discriminant Analysis and Statistical Pattern Recognition , 2005 .

[4]  Huan Liu,et al.  Toward integrating feature selection algorithms for classification and clustering , 2005, IEEE Transactions on Knowledge and Data Engineering.

[5]  Jana Novovicová,et al.  Evaluating the Stability of Feature Selectors That Optimize Feature Subset Cardinality , 2008, SSPR/SPR.

[6]  Ron Kohavi,et al.  Wrappers for Feature Subset Selection , 1997, Artif. Intell..

[7]  Richard Nock,et al.  A hybrid filter/wrapper approach of feature selection using information theory , 2002, Pattern Recognit..

[8]  David G. Stork,et al.  Pattern Classification , 1973 .

[9]  Shigeo Abe DrEng Pattern Classification , 2001, Springer London.

[10]  Constantin Zopounidis,et al.  Feature selection algorithms in classification problems: an experimental evaluation , 2005, Optim. Methods Softw..

[11]  Richard Jensen,et al.  Performing Feature Selection with ACO , 2006, Swarm Intelligence in Data Mining.

[12]  Josef Kittler,et al.  Divergence Based Feature Selection for Multimodal Class Densities , 1996, IEEE Trans. Pattern Anal. Mach. Intell..

[13]  Pavel Pudil,et al.  Oscillating Feature Subset Search Algorithm for Text Categorization , 2006, CIARP.

[14]  Huan Liu,et al.  Feature Selection for High-Dimensional Data: A Fast Correlation-Based Filter Solution , 2003, ICML.

[15]  Josef Kittler,et al.  Pattern recognition : a statistical approach , 1982 .

[16]  Pedro Larrañaga,et al.  A review of feature selection techniques in bioinformatics , 2007, Bioinform..

[17]  Melanie Hilario,et al.  Knowledge and Information Systems , 2007 .

[18]  Pavel Pudil,et al.  Introduction to Statistical Pattern Recognition , 2006 .

[19]  A. Wayne Whitney,et al.  A Direct Method of Nonparametric Measurement Selection , 1971, IEEE Transactions on Computers.

[20]  Sanmay Das,et al.  Filters, Wrappers and a Boosting-Based Hybrid for Feature Selection , 2001, ICML.

[21]  Josef Kittler,et al.  Fast branch & bound algorithms for optimal feature selection , 2004, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[22]  Mineichi Kudo,et al.  Comparison of algorithms that select features for pattern classifiers , 2000, Pattern Recognit..

[23]  Jack Sklansky,et al.  On Automatic Feature Selection , 1988, Int. J. Pattern Recognit. Artif. Intell..

[24]  Isabelle Guyon,et al.  An Introduction to Variable and Feature Selection , 2003, J. Mach. Learn. Res..

[25]  Keinosuke Fukunaga,et al.  Introduction to statistical pattern recognition (2nd ed.) , 1990 .

[26]  Pavel Pudil,et al.  Flexible-Hybrid Sequential Floating Search in Statistical Feature Selection , 2006, SSPR/SPR.

[27]  Hongbin Zhang,et al.  Feature selection using tabu search method , 2002, Pattern Recognit..

[28]  Fabrizio Sebastiani,et al.  Machine learning in automated text categorization , 2001, CSUR.

[29]  Yiming Yang,et al.  A Comparative Study on Feature Selection in Text Categorization , 1997, ICML.

[30]  David Casasent,et al.  An improvement on floating search algorithms for feature subset selection , 2009, Pattern Recognit..

[31]  David G. Stork,et al.  Pattern classification, 2nd Edition , 2000 .

[32]  Ludmila I. Kuncheva,et al.  A stability index for feature selection , 2007, Artificial Intelligence and Applications.

[33]  P. Cunningham,et al.  Solutions to Instability Problems with Sequential Wrapper-based Approaches to Feature Selection , 2002 .

[34]  Pavel Pudil,et al.  Oscillating search algorithms for feature selection , 2000, Proceedings 15th International Conference on Pattern Recognition. ICPR-2000.

[35]  Haleh Vafaie,et al.  Feature Selection Methods: Genetic Algorithms vs. Greedy-like Search , 2009 .

[36]  Igor Kononenko,et al.  Estimating Attributes: Analysis and Extensions of RELIEF , 1994, ECML.

[37]  Constantin F. Aliferis,et al.  Towards Principled Feature Selection: Relevancy, Filters and Wrappers , 2003 .

[38]  J. Wade Davis,et al.  Statistical Pattern Recognition , 2003, Technometrics.

[39]  Sarunas Raudys Feature Over-Selection , 2006, SSPR/SPR.

[40]  Anil K. Jain,et al.  Feature Selection: Evaluation, Application, and Small Sample Performance , 1997, IEEE Trans. Pattern Anal. Mach. Intell..

[41]  Josef Kittler,et al.  Feature selection based on the approximation of class densities by finite mixtures of special type , 1995, Pattern Recognit..

[42]  Juha Reunanen,et al.  Overfitting in Making Comparisons Between Variable Selection Methods , 2003, J. Mach. Learn. Res..

[43]  G. McLachlan Discriminant Analysis and Statistical Pattern Recognition , 1992 .

[44]  P. Pudil,et al.  of Techniques for Large-Scale Feature Selection , 1994 .

[45]  M. Skurichina,et al.  Stabilizing weak classifiers , 2001 .

[46]  David Casasent,et al.  Adaptive branch and bound algorithm for selecting optimal features , 2007, Pattern Recognit. Lett..

[47]  Rabab Kreidieh Ward,et al.  Genetic algorithms for feature selection and weighting, a review and study , 2001, Proceedings of Sixth International Conference on Document Analysis and Recognition.

[48]  Pat Langley,et al.  Selection of Relevant Features and Examples in Machine Learning , 1997, Artif. Intell..

[49]  Yoshua Bengio,et al.  Pattern Recognition and Neural Networks , 1995 .

[50]  Pavel Paclík,et al.  Adaptive floating search methods in feature selection , 1999, Pattern Recognit. Lett..

[51]  Josef Kittler,et al.  Floating search methods in feature selection , 1994, Pattern Recognit. Lett..

[52]  Jihoon Yang,et al.  Feature Subset Selection Using a Genetic Algorithm , 1998, IEEE Intell. Syst..

[53]  Silvia Casado Yusta,et al.  Different metaheuristic strategies to solve the feature selection problem , 2009, Pattern Recognit. Lett..

[54]  Francesc J. Ferri,et al.  Comparative study of techniques for large-scale feature selection* *This work was suported by a SERC grant GR/E 97549. The first author was also supported by a FPI grant from the Spanish MEC, PF92 73546684 , 1994 .

[55]  Josef Kittler,et al.  Improving Stability of Feature Selection Methods , 2007, CAIP.

[56]  B. Ripley,et al.  Pattern Recognition , 1968, Nature.

[57]  Pavel Pudil,et al.  Dynamic Oscillating Search algorithm for feature selection , 2008, 2008 19th International Conference on Pattern Recognition.

[58]  Eric P. Xing Feature Selection in Microarray Analysis , 2003 .

[59]  Huan Liu,et al.  Feature selection for clustering - a filter solution , 2002, 2002 IEEE International Conference on Data Mining, 2002. Proceedings..