论文信息 - Mining for Features to Improve Classification

Mining for Features to Improve Classification

When first faced with a learning task, it is often not clear what a satisfactory representation of the training data should be, and we are often forced to create some set of features that appear plausible, without any strong confidence that they will yield superior learning. Moreover, we often do not have any prior knowledge of what learning method is best to apply, and thus often try multiple methods in an attempt to find the one that performs best. This paper describes a method called Feature-mine, that takes a set of features and augments them with macro-features that test for the occurrence of combinations of values on the original features. Our approach uses associations that are mined from the data to create these new features. Importantly, the method is independent of any learning method, with the creation and selection of new features based simply on the relationship between the attributes and the class labels in

Sarah Zelikovitz

[1] Rajeev Motwani,et al. Beyond market baskets: generalizing association rules to correlations , 1997, SIGMOD '97.

[2] Tomasz Imielinski,et al. DataMine: Application Programming Interface and Query Language for Database Mining , 1996, KDD.

[3] Daniel Kudenko,et al. Feature Generation for Sequence Categorization , 1998, AAAI/IAAI.

[4] Alberto Maria Segre,et al. Programs for Machine Learning , 1994 .

[5] Ron Kohavi,et al. Wrappers for Feature Subset Selection , 1997, Artif. Intell..

[6] Ryszard S. Michalski,et al. A theory and methodology of inductive learning , 1993 .

[7] William W. Cohen. Fast Effective Rule Induction , 1995, ICML.

[8] J. Ross Quinlan,et al. C4.5: Programs for Machine Learning , 1992 .

[9] Tomasz Imielinski,et al. MSQL: A Query Language for Database Mining , 1999, Data Mining and Knowledge Discovery.

[10] Thomas G. Dietterich. What is machine learning? , 2020, Archives of Disease in Childhood.

[11] Haym Hirsh,et al. Integrating Background Knowledge into Nearest-Neighbor Text Classification , 2002, ECCBR.

[12] Steven Salzberg,et al. A Weighted Nearest Neighbor Algorithm for Learning with Symbolic Features , 2004, Machine Learning.