Artificial Intelligence and Information Technology Evaluating feature selection methods for learning in data mining applications

Recent advances in computing technology in terms of speed, cost, as well as access to tremendous amounts of computing power and the ability to process huge amounts of data in reasonable time have spurred increased interest in data mining applications. Machine learning has been one of the methods used in most of these data mining applications. The data used as input to any of these learning systems are the primary source of knowledge in terms of what is learned by these systems. There have been relatively few studies on preprocessing data used as input in these data mining systems. In this study, we evaluate several feature selection methods as to their effectiveness in preprocessing input data. We use real-world financial credit-risk data in evaluating these systems.

[1]  Thomas M. Cover,et al.  The Best Two Independent Measurements Are Not the Two Best , 1974, IEEE Trans. Syst. Man Cybern..

[2]  Thomas G. Dietterich,et al.  Learning with Many Irrelevant Features , 1991, AAAI.

[3]  Ron Kohavi,et al.  Wrappers for performance enhancement and oblivious decision graphs , 1995 .

[4]  Josef Kittler,et al.  Pattern recognition : a statistical approach , 1982 .

[5]  Thomas G. Dietterich,et al.  Learning Boolean Concepts in the Presence of Many Irrelevant Features , 1994, Artif. Intell..

[6]  Edward Wilson Reed,et al.  Commercial Bank Management , 1963 .

[7]  Keinosuke Fukunaga,et al.  Introduction to Statistical Pattern Recognition , 1972 .

[8]  A. Abdel-khalik,et al.  Information Choice and Utilization in an Experiment on Default Prediction , 1980 .

[9]  Godfried T. Toussaint,et al.  Note on optimal selection of independent binary-valued features for pattern recognition (Corresp.) , 1971, IEEE Trans. Inf. Theory.

[10]  Selwyn Piramuthu,et al.  Improving Connectionist Learning with Symbolic Feature Construction , 1992 .

[11]  L. Milne Feature Selection Using Neural Networks with Contribution Measures , 1995 .

[12]  J. Ross Quinlan,et al.  Decision trees and decision-making , 1990, IEEE Trans. Syst. Man Cybern..

[13]  William S. Meisel,et al.  Computer-oriented approaches to pattern recognition , 1972 .

[14]  Ron Kohavi,et al.  Irrelevant Features and the Subset Selection Problem , 1994, ICML.

[15]  Larry A. Rendell,et al.  A Practical Approach to Feature Selection , 1992, ML.

[16]  Melody Y. Kiang,et al.  Managerial Applications of Neural Networks: The Case of Bank Failure Predictions , 1992 .

[17]  Roberto Battiti,et al.  Using mutual information for selecting features in supervised neural net learning , 1994, IEEE Trans. Neural Networks.

[18]  C. Chang Dynamic programming as applied to feature subset selection in a pattern recognition system , 1972, ACM Annual Conference.

[19]  Daphne Koller,et al.  Toward Optimal Feature Selection , 1996, ICML.

[20]  Steve Jameson,et al.  Information Discovery in High-Volume, Frequently Changing Data , 1995, IEEE Expert.

[21]  J. Elashoff,et al.  On the choice of variables in classification problems with dichotomous variables. , 1967, Biometrika.

[22]  Josef Kittler,et al.  Mathematics Methods of Feature Selection in Pattern Recognition , 1975, Int. J. Man Mach. Stud..

[23]  Hiroshi Motoda,et al.  Feature Extraction, Construction and Selection: A Data Mining Perspective , 1998 .

[24]  Jihoon Yang,et al.  Feature Subset Selection Using a Genetic Algorithm , 1998, IEEE Intell. Syst..

[25]  Jack Sklansky,et al.  A note on genetic algorithms for large-scale feature selection , 1989, Pattern Recognit. Lett..

[26]  J. Ross Quinlan,et al.  Simplifying Decision Trees , 1987, Int. J. Man Mach. Stud..

[27]  Josef Kittler,et al.  Floating search methods for feature selection with nonmonotonic criterion functions , 1994, Proceedings of the 12th IAPR International Conference on Pattern Recognition, Vol. 3 - Conference C: Signal Processing (Cat. No.94CH3440-5).

[28]  Josef Kittler,et al.  Divergence Based Feature Selection for Multimodal Class Densities , 1996, IEEE Trans. Pattern Anal. Mach. Intell..

[29]  Tim Watson,et al.  Problems with Using Genetic Algorithms for Neural Network Feature Selection , 1994, ECAI.

[30]  Igor Kononenko,et al.  Estimating Attributes: Analysis and Extensions of RELIEF , 1994, ECML.

[31]  Huan Liu,et al.  Neural-network feature selector , 1997, IEEE Trans. Neural Networks.

[32]  Tapio Elomaa,et al.  A Geometric Approach to Feature Selection , 1994, ECML.

[33]  Sholom M. Weiss,et al.  Feature Extraction for Massive Data Mining , 1995, KDD.

[34]  Hiroshi Motoda,et al.  Feature Selection for Knowledge Discovery and Data Mining , 1998, The Springer International Series in Engineering and Computer Science.

[35]  Ron Kohavi,et al.  Useful Feature Subsets and Rough Set Reducts , 1994 .

[36]  Heidar A. Malki,et al.  Using the Karhunen-Loe've transformation in the back-propagation training algorithm , 1991, IEEE Trans. Neural Networks.

[37]  Maciej Modrzejewski,et al.  Feature Selection Using Rough Sets Theory , 1993, ECML.