论文信息 - Feature Subset Selection with TAR2less

Feature Subset Selection with TAR2less

A repeated empirical result is that machine learners can learn adequate models using a small subset of the available features. Learning from such subsets can be faster, and produces simpler models. In this paper we present a new method for feature subset selection using the TAR2 treatment learner. TAR2 assumes small backbones; i.e. a small number of features will suffice for selecting preferred classes. TAR2 can be used as a pre-processor to other learners for identifying useful feature subsets. When compared to other methods described in a recent survey by Hall and Holmes (in press), TAR2 found the smallest subsets, with minimal or no change in classification accuracy.

Tim Menzies | Ying Hu | Rajesh Gunnalan | Kalaivani Appukutty | Amarnath Srinivasan

[1] Igor Kononenko,et al. Estimating Attributes: Analysis and Extensions of RELIEF , 1994, ECML.

[2] Stephen D. Bay,et al. Detecting change in categorical data: mining contrast sets , 1999, KDD '99.

[3] Wynne Hsu,et al. Integrating Classification and Association Rule Mining , 1998, KDD.

[4] Geoff Holmes,et al. Benchmarking Attribute Selection Techniques for Discrete Class Data Mining , 2003, IEEE Trans. Knowl. Data Eng..

[5] Huan Liu,et al. A Probabilistic Approach to Feature Selection - A Filter Solution , 1996, ICML.

[6] Ramakrishnan Srikant,et al. Fast Algorithms for Mining Association Rules in Large Databases , 1994, VLDB.

[7] Andrew J. Parkes,et al. Clustering at the Phase Transition , 1997, AAAI/IAAI.

[8] Ada Wai-Chee Fu,et al. Mining association rules with weighted items , 1998, Proceedings. IDEAS'98. International Database Engineering and Applications Symposium (Cat. No.98EX156).

[9] Mark A. Hall,et al. Correlation-based Feature Selection for Machine Learning , 2003 .

[10] Thomas G. Dietterich,et al. Learning with Many Irrelevant Features , 1991, AAAI.

[11] Ke Wang,et al. Mining confident rules without support requirement , 2001, CIKM '01.

[12] Ramakrishnan Srikant,et al. Fast algorithms for mining association rules , 1998, VLDB 1998.

[13] Robert C. Holte,et al. Very Simple Classification Rules Perform Well on Most Commonly Used Datasets , 1993, Machine Learning.

[14] Larry A. Rendell,et al. A Practical Approach to Feature Selection , 1992, ML.

[15] Yiming Yang,et al. A Comparative Study on Feature Selection in Text Categorization , 1997, ICML.

[16] Mark A. Hall,et al. Correlation-based Feature Selection for Discrete and Numeric Class Machine Learning , 1999, ICML.

[17] Tim Menzies,et al. Practical large scale what-if queries: case studies with software risk assessment , 2000, Proceedings ASE 2000. Fifteenth IEEE International Conference on Automated Software Engineering.

[18] Ron Kohavi,et al. Wrappers for Feature Subset Selection , 1997, Artif. Intell..

[19] Leo Breiman,et al. Classification and Regression Trees , 1984 .

[20] Susan T. Dumais,et al. Inductive learning algorithms and representations for text categorization , 1998, CIKM '98.

[21] Tim Menzies,et al. Many Maybes Mean (Mostly) the Same Thing , 2004 .