论文信息 - Economical active feature-value acquisition through Expected Utility estimation - 字舞流文

Economical active feature-value acquisition through Expected Utility estimation

In many classification tasks training data have missing feature values that can be acquired at a cost. For building accurate predictive models, acquiring all missing values is often prohibitively expensive or unnecessary, while acquiring a random subset of feature values may not be most effective. The goal of active feature-value acquisition is to incrementally select feature values that are most cost-effective for improving the model's accuracy. We present two policies, Sampled Expected Utility and Expected Utility-ES, that acquire feature values for inducing a classification model based on an estimation of the expected improvement in model accuracy per unit cost. A comparison of the two policies to each other and to alternative policies demonstrate that Sampled Expected Utility is preferable as it effectively reduces the cost of producing a model of a desired accuracy and exhibits a consistent performance across domains.

Maytal Saar-Tsechansky | Foster Provost | Prem Melville | Raymond Mooney | R. Mooney | F. Provost | M. Saar-Tsechansky | Prem Melville

[1] D. Rubin,et al. Statistical Analysis with Missing Data. , 1989 .

[2] Ian Witten,et al. Data Mining , 2000 .

[3] Daphne Koller,et al. Active Learning for Parameter Estimation in Bayesian Networks , 2000, NIPS.

[4] Ming Tan,et al. Two Case Studies in Cost-Sensitive Concept Acquisition , 1990, AAAI.

[5] Russell Greiner,et al. Budgeted learning of nailve-bayes classifiers , 2002, UAI 2002.

[6] Michael Lindenbaum,et al. Selective Sampling for Nearest Neighbor Classifiers , 1999, Machine Learning.

[7] David A. Cohn,et al. Improving generalization with active learning , 1994, Machine Learning.

[8] Alberto Maria Segre,et al. Programs for Machine Learning , 1994 .

[9] Catherine Blake,et al. UCI Repository of machine learning databases , 1998 .

[10] Russell Greiner,et al. Budgeted Learning of Naive-Bayes Classifiers , 2003, UAI.

[11] J. Ross Quinlan,et al. C4.5: Programs for Machine Learning , 1992 .

[12] Dan Roth,et al. Learning cost-sensitive active classifiers , 2002, Artif. Intell..

[13] Ian H. Witten,et al. Data mining: practical machine learning tools and techniques with Java implementations , 2002, SGMD.

[14] Andrew McCallum,et al. Toward Optimal Active Learning through Sampling Estimation of Error Reduction , 2001, ICML.

[15] H. Toutenburg. Little, R.J.A. and D.B. Rubin:Statistical analysis with missing data , 1991 .

[16] Usama M. Fayyad,et al. Multi-Interval Discretization of Continuous-Valued Attributes for Classification Learning , 1993, IJCAI.

[17] Foster J. Provost,et al. Active feature-value acquisition for classifier induction , 2004, Fourth IEEE International Conference on Data Mining (ICDM'04).

[18] H. Sebastian Seung,et al. Selective Sampling Using the Query by Committee Algorithm , 1997, Machine Learning.

[19] Ron Kohavi,et al. Irrelevant Features and the Subset Selection Problem , 1994, ICML.

[20] 金田重郎,et al. C4.5: Programs for Machine Learning (書評) , 1995 .

[21] Zhiqiang Zheng,et al. On active learning for data acquisition , 2002, 2002 IEEE International Conference on Data Mining, 2002. Proceedings..

[22] Peter D. Turney. Types of Cost in Inductive Concept Learning , 2002, ArXiv.