Test-cost sensitive classification on data with missing values

In the area of cost-sensitive learning, inductive learning algorithms have been extended to handle different types of costs to better represent misclassification errors. Most of the previous works have only focused on how to deal with misclassification costs. In this paper, we address the equally important issue of how to handle the test costs associated with querying the missing values in a test case. When an attribute contains a missing value in a test case, it may or may not be worthwhile to take the extra effort in order to obtain a value for that attribute, or attributes, depending on how much benefit the new value bring about in increasing the accuracy. In this paper, we consider how to integrate test-cost-sensitive learning with the handling of missing values in a unified framework that includes model building and a testing strategy. The testing strategies determine which attributes to perform the test on in order to minimize the sum of the classification costs and test costs. We show how to instantiate this framework in two popular machine learning algorithms: decision trees and naive Bayesian method. We empirically evaluate the test-cost-sensitive methods for handling missing values on several data sets.

[1]  Catherine Blake,et al.  UCI Repository of machine learning databases , 1998 .

[2]  Shigeo Abe DrEng Pattern Classification , 2001, Springer London.

[3]  Alberto Maria Segre,et al.  Programs for Machine Learning , 1994 .

[4]  Marlon Núñez The use of background knowledge in decision tree induction , 2004, Machine Learning.

[5]  Qiang Yang,et al.  Test-cost sensitive naive Bayes classification , 2004, Fourth IEEE International Conference on Data Mining (ICDM'04).

[6]  Clifford Stein,et al.  Introduction to Algorithms, 2nd edition. , 2001 .

[7]  Peter D. Turney Types of Cost in Inductive Concept Learning , 2002, ArXiv.

[8]  Ming Tan,et al.  Cost-sensitive learning of classification knowledge and its applications in robotics , 2004, Machine Learning.

[9]  Peter D. Turney Cost-Sensitive Classification: Empirical Evaluation of a Hybrid Genetic Decision Tree Induction Algorithm , 1994, J. Artif. Intell. Res..

[10]  Steven W. Norton Generating Better Decision Trees , 1989, IJCAI.

[11]  Qiang Yang,et al.  Decision trees with minimal costs , 2004, ICML.

[12]  Thomas G. Dietterich,et al.  Pruning Improves Heuristic Search for Cost-Sensitive Learning , 2002, ICML.

[13]  Pedro M. Domingos,et al.  On the Optimality of the Simple Bayesian Classifier under Zero-One Loss , 1997, Machine Learning.

[14]  Charles Elkan,et al.  The Foundations of Cost-Sensitive Learning , 2001, IJCAI.

[15]  David G. Stork,et al.  Pattern Classification , 1973 .

[16]  Kai Ming Ting,et al.  Inducing Cost-Sensitive Trees via Instance Weighting , 1998, PKDD.

[17]  Pedro M. Domingos MetaCost: a general method for making classifiers cost-sensitive , 1999, KDD '99.

[18]  Dan Roth,et al.  Learning cost-sensitive active classifiers , 2002, Artif. Intell..

[19]  J. Ross Quinlan,et al.  C4.5: Programs for Machine Learning , 1992 .

[20]  Marlon Núñez,et al.  Economic Induction: A Case Study , 1988, EWSL.

[21]  Usama M. Fayyad,et al.  Multi-Interval Discretization of Continuous-Valued Attributes for Classification Learning , 1993, IJCAI.