Active sampling for feature selection

In knowledge discovery applications, where new features are to be added, an acquisition policy can help select the features to be acquired based on their relevance and the cost of extraction. This can be posed as a feature selection problem where the feature values are not known in advance. We propose a technique to actively sample the feature values with the ultimate goal of choosing between alternative candidate features with minimum sampling cost. Our heuristic algorithm is based on extracting candidate features in a region of the instance space where the feature value is likely to alter our knowledge the most. An experimental evaluation on a standard database shows that it is possible outperform a random subsampling policy in terms of the accuracy in feature selection.

[1]  Zhiqiang Zheng,et al.  On active learning for data acquisition , 2002, 2002 IEEE International Conference on Data Mining, 2002. Proceedings..

[2]  Jan M. Zytkow,et al.  Handbook of Data Mining and Knowledge Discovery , 2002 .

[3]  Daphne Koller,et al.  Support Vector Machine Active Learning with Application sto Text Classification , 2000, ICML.

[4]  G. Hughes,et al.  Sampling for decision making in crop loss assessment and pest management: introduction. , 1999, Phytopathology.

[5]  Daphne Koller,et al.  Support Vector Machine Active Learning with Applications to Text Classification , 2000, J. Mach. Learn. Res..

[6]  Andrew McCallum,et al.  Toward Optimal Active Learning through Sampling Estimation of Error Reduction , 2001, ICML.

[7]  Peter D. Turney Cost-Sensitive Classification: Empirical Evaluation of a Hybrid Genetic Decision Tree Induction Algorithm , 1994, J. Artif. Intell. Res..

[8]  Pedro M. Domingos MetaCost: a general method for making classifiers cost-sensitive , 1999, KDD '99.

[9]  C.J.H. Mann,et al.  Handbook of Data Mining and Knowledge Discovery , 2004 .

[10]  P. Avesani,et al.  Feeding Data Mining , 2002 .

[11]  Isabelle Guyon,et al.  An Introduction to Variable and Feature Selection , 2003, J. Mach. Learn. Res..

[12]  David Cohn Improving Generalization with Active Learning January , 2002 .

[13]  Pat Langley,et al.  Selection of Relevant Features and Examples in Machine Learning , 1997, Artif. Intell..

[14]  Karl Rihaczek,et al.  1. WHAT IS DATA MINING? , 2019, Data Mining for the Social Sciences.

[15]  Jan P. Nyrop,et al.  Sampling for IPM Decision Making: Where Should We Invest Time and Resources? , 1999, Phytopathology.

[16]  L. Madden,et al.  Sampling for plant disease incidence. , 1999, Phytopathology.

[17]  Huan Liu,et al.  Feature Selection with Selective Sampling , 2002, International Conference on Machine Learning.

[18]  Gérard Dreyfus,et al.  Ranking a Random Feature for Variable and Feature Selection , 2003, J. Mach. Learn. Res..

[19]  Foster J. Provost,et al.  Active Sampling for Class Probability Estimation and Ranking , 2004, Machine Learning.