Cost-Sensitive Decision Trees Applied to Medical Data

Classification plays an important role in medicine, especially for medical diagnosis. Health applications often require classifiers that minimize the total cost, including misclassifications costs and test costs. In fact, there are many reasons for considering costs in medicine, as diagnostic tests are not free and health budgets are limited. Our aim with this work was to define, implement and test a strategy for cost-sensitive learning. We defined an algorithm for decision tree induction that considers costs, including test costs, delayed costs and costs associated with risk. Then we applied our strategy to train and evaluate cost-sensitive decision trees in medical data. Built trees can be tested following some strategies, including group costs, common costs, and individual costs. Using the factor of "risk" it is possible to penalize invasive or delayed tests and obtain decision trees patient-friendly.

[1]  J. Ross Quinlan,et al.  C4.5: Programs for Machine Learning , 1992 .

[2]  Robert C. Holte,et al.  Exploiting the Cost (In)sensitivity of Decision Tree Splitting Criteria , 2000, ICML.

[3]  Tom Fawcett,et al.  Activity monitoring: noticing interesting changes in behavior , 1999, KDD '99.

[4]  Shlomo Zilberstein,et al.  Attribute measurement policies for time and cost sensitive classification , 2004, Fourth IEEE International Conference on Data Mining (ICDM'04).

[5]  Dan Roth,et al.  Learning cost-sensitive active classifiers , 2002, Artif. Intell..

[6]  Qiang Yang,et al.  Test-cost sensitive naive Bayes classification , 2004, Fourth IEEE International Conference on Data Mining (ICDM'04).

[7]  Qiang Yang,et al.  Decision trees with minimal costs , 2004, ICML.

[8]  Charles Elkan,et al.  The Foundations of Cost-Sensitive Learning , 2001, IJCAI.

[9]  Thomas G. Dietterich,et al.  Pruning Improves Heuristic Search for Cost-Sensitive Learning , 2002, ICML.

[10]  Peter D. Turney Types of Cost in Inductive Concept Learning , 2002, ArXiv.

[11]  Charles X. Ling,et al.  Hybrid Cost-Sensitive Decision Tree , 2005, PKDD.

[12]  Peter D. Turney Cost-Sensitive Classification: Empirical Evaluation of a Hybrid Genetic Decision Tree Induction Algorithm , 1994, J. Artif. Intell. Res..

[13]  Ian H. Witten,et al.  Data mining: practical machine learning tools and techniques, 3rd Edition , 1999 .

[14]  Ian Witten,et al.  Data Mining , 2000 .

[15]  Marlon Núñez,et al.  The Use of Background Knowledge in Decision Tree Induction , 1991, Machine Learning.

[16]  Qiang Yang,et al.  Simple Test Strategies for Cost-Sensitive Decision Trees , 2005, ECML.

[17]  Qiang Yang,et al.  Test strategies for cost-sensitive decision trees , 2006, IEEE Transactions on Knowledge and Data Engineering.

[18]  Shichao Zhang,et al.  "Missing is useful": missing values in cost-sensitive decision trees , 2005, IEEE Transactions on Knowledge and Data Engineering.

[19]  W. Grobman,et al.  Methods of clinical prediction. , 2006, American journal of obstetrics and gynecology.

[20]  Christopher M. Bishop,et al.  Classification and regression , 1997 .

[21]  Alberto Maria Segre,et al.  Programs for Machine Learning , 1994 .

[22]  Leo Breiman,et al.  Classification and Regression Trees , 1984 .

[23]  K.J. Cios,et al.  From the guest editor medical data mining and knowledge discovery , 2000, IEEE Engineering in Medicine and Biology Magazine.

[24]  Maytal Saar-Tsechansky,et al.  Economical active feature-value acquisition through Expected Utility estimation , 2005, UBDM '05.