Some issues on scalable feature selection

Abstract Feature selection determines relevant features in the data. It is often applied in pattern classification, data mining, as well as machine learning. A special concern for feature selection nowadays is that the size of a database is normally very large, both vertically and horizontally. In addition, feature sets may grow as the data collection process continues. Effective solutions are needed to accommodate the practical demands. This paper concentrates on three issues: large number of features, large data size, and expanding feature set. For the first issue, we suggest a probabilistic algorithm to select features. For the second issue, we present a scalable probabilistic algorithm that expedites feature selection further and can scale up without sacrificing the quality of selected features. For the third issue, we propose an incremental algorithm that adapts to the newly extended feature set and captures `concept drifts' by removing features from previously selected and newly added ones. We expect that research on scalable feature selection will be extended to distributed and parallel computing and have impact on applications of data mining and machine learning.

[1]  Dennis Murray,et al.  Data warehousing in the real world - a practical guide for building decision support systems , 1997 .

[2]  Mark S. Boddy,et al.  Deliberation Scheduling for Problem Solving in Time-Constrained Environments , 1994, Artif. Intell..

[3]  Thomas G. Dietterich,et al.  Learning Boolean Concepts in the Presence of Many Irrelevant Features , 1994, Artif. Intell..

[4]  Satosi Watanabe,et al.  Pattern Recognition: Human and Mechanical , 1985 .

[5]  Pat Langley,et al.  Selection of Relevant Features and Examples in Machine Learning , 1997, Artif. Intell..

[6]  Huan Liu,et al.  Neural-network feature selector , 1997, IEEE Trans. Neural Networks.

[7]  Barry Devlin,et al.  Data Warehouse: From Architecture to Implementation , 1996 .

[8]  Huan Liu,et al.  A Monotonic Measure for Optimal Feature Selection , 1998, ECML.

[9]  Gerhard Widmer,et al.  Recognition and Exploitation of Contextual CLues via Incremental Meta-Learning , 1996, ICML.

[10]  Huan Liu,et al.  Chi2: feature selection and discretization of numeric attributes , 1995, Proceedings of 7th IEEE International Conference on Tools with Artificial Intelligence.

[11]  Benjamin W. Wah,et al.  Principled Constructive Induction , 1989, IJCAI.

[12]  J. Ross Quinlan,et al.  C4.5: Programs for Machine Learning , 1992 .

[13]  Huan Liu,et al.  A Probabilistic Approach to Feature Selection - A Filter Solution , 1996, ICML.

[14]  Alex Alves Freitas,et al.  Mining Very Large Databases with Parallel Processing , 1997, The Kluwer International Series on Advances in Database Systems.

[15]  Ron Kohavi,et al.  Irrelevant Features and the Subset Selection Problem , 1994, ICML.

[16]  Maciej Modrzejewski,et al.  Feature Selection Using Rough Sets Theory , 1993, ECML.

[17]  Larry A. Rendell,et al.  The Feature Selection Problem: Traditional Methods and a New Algorithm , 1992, AAAI.

[18]  Josef Kittler,et al.  Pattern recognition : a statistical approach , 1982 .

[19]  A. K. Jain,et al.  A critical evaluation of intrinsic dimensionality algorithms. , 1980 .

[20]  Gilles Brassard,et al.  Fundamentals of algorithms , 1996 .