Novel Methods for Feature Subset Selection with Respect to Problem Knowledge

Recent advances in the statistical methodology for selecting optimal subsets of features for data representation and classification are presented. This chapter attempts to provide a guideline of which approach to choose with respect to the extent of a priori knowledge of the problem. Two basic approaches are reviewed and the conditions under which they should be used are specified. One approach involves the use of the computationally effective Floating search methods. The alternative approach trades off the requirement for a priori information for the requirement of sufficient data to represent the distributions involved. Owing to its nature it is particularly suitable for cases when the underlying probability distributions are not unimodal. The approach attempts to achieve simultaneous feature selection and decision rule inference. According to the criterion adopted there are two variants allowing the selection of features either for optimal representation or discrimination.

[1]  Josef Kittler,et al.  An analysis of the Max-Min approach to feature selection and ordering , 1993, Pattern Recognit. Lett..

[2]  Jack Sklansky,et al.  On Automatic Feature Selection , 1988, Int. J. Pattern Recognit. Artif. Intell..

[3]  Pavel Pudil,et al.  Novel Methods for Subset Selection with Respect to Problem Knowledge , 1998, IEEE Intell. Syst..

[4]  Henrik I. Christensen,et al.  Pattern Recognition in Practice IV: Multiple Paradigms, Comparative Studies and Hybrid Systems , 1994 .

[5]  R. Redner,et al.  Mixture densities, maximum likelihood, and the EM algorithm , 1984 .

[6]  Josef Kittler,et al.  Floating search methods in feature selection , 1994, Pattern Recognit. Lett..

[7]  Anil K. Jain,et al.  Feature Selection: Evaluation, Application, and Small Sample Performance , 1997, IEEE Trans. Pattern Anal. Mach. Intell..

[8]  Josef Kittler,et al.  Feature selection based on the approximation of class densities by finite mixtures of special type , 1995, Pattern Recognit..

[9]  Francesc J. Ferri,et al.  Comparative study of techniques for large-scale feature selection* *This work was suported by a SERC grant GR/E 97549. The first author was also supported by a FPI grant from the Spanish MEC, PF92 73546684 , 1994 .

[10]  Josef Kittler,et al.  Pattern recognition : a statistical approach , 1982 .

[11]  Dick E. Boekee,et al.  Some aspects of error bounds in feature selection , 1979, Pattern Recognit..

[12]  Josef Kittler,et al.  Divergence Based Feature Selection for Multimodal Class Densities , 1996, IEEE Trans. Pattern Anal. Mach. Intell..