Perspectives of Feature Selection

From here on, we study feature selection for classification. By choosing this type of feature selection, we can focus on many common perspectives of feature selection, obtain a deep understanding of basic issues of feature selection, appreciate many different methods of feature selection, and later in the book move on to topics related to feature selection. The problem of feature selection can be examined in many perspectives. The four major ones are (1) how to search for the “best” features? (2) what should be used to determine best features, or what are the criteria for evaluation? (3) how should new features be generated for selection, adding or deleting one feature to the existing subset or changing a subset of features? (That is, feature generation is conducted sequentially or in parallel.) and (4) how applications determine feature selection? Applications have different requirements in terms of computational time, results, etc. For instance, the focus of machine learning (Dietterich, 1997) differs from that of data mining (Fayyad et al., 1996).

[1]  David Haussler,et al.  Occam's Razor , 1987, Inf. Process. Lett..

[2]  Jack Sklansky,et al.  On Automatic Feature Selection , 1988, Int. J. Pattern Recognit. Artif. Intell..

[3]  Larry A. Rendell,et al.  The Feature Selection Problem: Traditional Methods and a New Algorithm , 1992, AAAI.

[4]  Ron Kohavi,et al.  Irrelevant Features and the Subset Selection Problem , 1994, ICML.

[5]  Jerome H. Friedman,et al.  On Bias, Variance, 0/1—Loss, and the Curse-of-Dimensionality , 2004, Data Mining and Knowledge Discovery.

[6]  J. Ross Quinlan,et al.  C4.5: Programs for Machine Learning , 1992 .

[7]  Jeffrey C. Schlimmer,et al.  Efficiently Inducing Determinations: A Complete and Systematic Search Algorithm that Uses Optimal Pruning , 1993, ICML.

[8]  Thomas G. Dietterich Machine-Learning Research , 1997, AI Mag..

[9]  Ron Kohavi,et al.  Wrappers for Feature Subset Selection , 1997, Artif. Intell..

[10]  Ron Kohavi,et al.  Wrappers for performance enhancement and oblivious decision graphs , 1995 .

[11]  Hiroshi Motoda,et al.  Feature Extraction, Construction and Selection , 1998 .

[12]  Martin T. Hagan,et al.  Neural network design , 1995 .

[13]  Leo Breiman,et al.  Classification and Regression Trees , 1984 .

[14]  Peter Norvig,et al.  Artificial Intelligence: A Modern Approach , 1995 .

[15]  Hiroshi Motoda,et al.  Feature Extraction, Construction and Selection: A Data Mining Perspective , 1998 .

[16]  Padhraic Smyth,et al.  From Data Mining to Knowledge Discovery: An Overview , 1996, Advances in Knowledge Discovery and Data Mining.

[17]  Ron Kohavi,et al.  The Wrapper Approach , 1998 .

[18]  Keinosuke Fukunaga,et al.  A Branch and Bound Algorithm for Feature Subset Selection , 1977, IEEE Transactions on Computers.

[19]  Huan Liu,et al.  A Probabilistic Approach to Feature Selection - A Filter Solution , 1996, ICML.

[20]  Daphne Koller,et al.  Toward Optimal Feature Selection , 1996, ICML.

[21]  Shlomo Zilberstein,et al.  Using Anytime Algorithms in Intelligent Systems , 1996, AI Mag..

[22]  David W. Aha,et al.  Feature Weighting for Lazy Learning Algorithms , 1998 .

[23]  Pedro M. Domingos,et al.  Beyond Independence: Conditions for the Optimality of the Simple Bayesian Classifier , 1996, ICML.

[24]  Sholom M. Weiss,et al.  Computer Systems That Learn , 1990 .

[25]  Thomas G. Dietterich,et al.  Learning Boolean Concepts in the Presence of Many Irrelevant Features , 1994, Artif. Intell..

[26]  Pat Langley,et al.  Selection of Relevant Features and Examples in Machine Learning , 1997, Artif. Intell..