"Missing is useful": missing values in cost-sensitive decision trees

Many real-world data sets for machine learning and data mining contain missing values and much previous research regards it as a problem and attempts to impute missing values before training and testing. In this paper, we study this issue in cost-sensitive learning that considers both test costs and misclassification costs. If some attributes (tests) are too expensive in obtaining their values, it would be more cost-effective to miss out their values, similar to skipping expensive and risky tests (missing values) in patient diagnosis (classification). That is, "missing is useful" as missing values actually reduces the total cost of tests and misclassifications and, therefore, it is not meaningful to impute their values. We discuss and compare several strategies that utilize only known values and that "missing is useful" for cost reduction in cost-sensitive decision tree learning.

[1]  J. Ross Quinlan,et al.  C4.5: Programs for Machine Learning , 1992 .

[2]  Alberto Maria Segre,et al.  Programs for Machine Learning , 1994 .

[3]  Kai Ming Ting,et al.  Inducing Cost-Sensitive Trees via Instance Weighting , 1998, PKDD.

[4]  Qiang Yang,et al.  Decision trees with minimal costs , 2004, ICML.

[5]  Charles Elkan,et al.  The Foundations of Cost-Sensitive Learning , 2001, IJCAI.

[6]  Nicole A. Lazar,et al.  Statistical Analysis With Missing Data , 2003, Technometrics.

[7]  Ming Tan,et al.  Cost-sensitive learning of classification knowledge and its applications in robotics , 2004, Machine Learning.

[8]  Gustavo E. A. P. A. Batista,et al.  An analysis of four missing data treatment methods for supervised learning , 2003, Appl. Artif. Intell..

[9]  Thomas G. Dietterich,et al.  Pruning Improves Heuristic Search for Cost-Sensitive Learning , 2002, ICML.

[10]  Michael J. Pazzani,et al.  HYDRA: A Noise-tolerant Relational Concept Learning Algorithm , 1993, IJCAI.

[11]  J. Ross Quinlan,et al.  Unknown Attribute Values in Induction , 1989, ML.

[12]  Gregory Piatetsky-Shapiro,et al.  Advances in Knowledge Discovery and Data Mining , 2004, Lecture Notes in Computer Science.

[13]  Peter C. Cheeseman,et al.  Bayesian Classification (AutoClass): Theory and Results , 1996, Advances in Knowledge Discovery and Data Mining.

[14]  Catherine Blake,et al.  UCI Repository of machine learning databases , 1998 .

[15]  Marlon Núñez The use of background knowledge in decision tree induction , 2004, Machine Learning.

[16]  Tariq Samad,et al.  Imputation of Missing Data in Industrial Databases , 1999, Applied Intelligence.

[17]  Peter D. Turney Types of Cost in Inductive Concept Learning , 2002, ArXiv.

[18]  Peter D. Turney Cost-Sensitive Classification: Empirical Evaluation of a Hybrid Genetic Decision Tree Induction Algorithm , 1994, J. Artif. Intell. Res..

[19]  Ron Kohavi,et al.  Lazy Decision Trees , 1996, AAAI/IAAI, Vol. 1.

[20]  Usama M. Fayyad,et al.  Multi-Interval Discretization of Continuous-Valued Attributes for Classification Learning , 1993, IJCAI.

[21]  Qiang Yang,et al.  Test-cost sensitive naive Bayes classification , 2004, Fourth IEEE International Conference on Data Mining (ICDM'04).

[22]  Pedro M. Domingos MetaCost: a general method for making classifiers cost-sensitive , 1999, KDD '99.

[23]  Dan Roth,et al.  Learning cost-sensitive active classifiers , 2002, Artif. Intell..

[24]  Peter Clark,et al.  The CN2 Induction Algorithm , 1989, Machine Learning.