Handling over-fitting in test cost-sensitive decision tree learning by feature selection, smoothing and pruning

Cost-sensitive learning algorithms are typically designed for minimizing the total cost when multiple costs are taken into account. Like other learning algorithms, cost-sensitive learning algorithms must face a significant challenge, over-fitting, in an applied context of cost-sensitive learning. Specifically speaking, they can generate good results on training data but normally do not produce an optimal model when applied to unseen data in real world applications. It is called data over-fitting. This paper deals with the issue of data over-fitting by designing three simple and efficient strategies, feature selection, smoothing and threshold pruning, against the TCSDT (test cost-sensitive decision tree) method. The feature selection approach is used to pre-process the data set before applying the TCSDT algorithm. The smoothing and threshold pruning are used in a TCSDT algorithm before calculating the class probability estimate for each decision tree leaf. To evaluate our approaches, we conduct extensive experiments on the selected UCI data sets across different cost ratios, and on a real world data set, KDD-98 with real misclassification cost. The experimental results show that our algorithms outperform both the original TCSDT and other competing algorithms on reducing data over-fitting.

[1]  J. Ross Quinlan,et al.  Induction of Decision Trees , 1986, Machine Learning.

[2]  Pedro M. Domingos,et al.  Tree Induction for Probability-Based Ranking , 2003, Machine Learning.

[3]  Wei-Yin Loh,et al.  Classification and regression trees , 2011, WIREs Data Mining Knowl. Discov..

[4]  Peter D. Turney Types of Cost in Inductive Concept Learning , 2002, ArXiv.

[5]  I. Bratko,et al.  Learning decision rules in noisy domains , 1987 .

[6]  Bianca Zadrozny,et al.  Obtaining calibrated probability estimates from decision trees and naive Bayesian classifiers , 2001, ICML.

[7]  Qiang Yang,et al.  Decision trees with minimal costs , 2004, ICML.

[8]  Charles Elkan,et al.  The Foundations of Cost-Sensitive Learning , 2001, IJCAI.

[9]  Donato Malerba,et al.  A Comparative Analysis of Methods for Pruning Decision Trees , 1997, IEEE Trans. Pattern Anal. Mach. Intell..

[10]  Peter D. Turney Cost-Sensitive Classification: Empirical Evaluation of a Hybrid Genetic Decision Tree Induction Algorithm , 1994, J. Artif. Intell. Res..

[11]  Jun Du,et al.  Cost-Sensitive Decision Trees with Pre-pruning , 2007, Canadian Conference on AI.

[12]  J. Ross Quinlan,et al.  Generating Production Rules from Decision Trees , 1987, IJCAI.

[13]  Rich Caruana,et al.  Predicting good probabilities with supervised learning , 2005, ICML.

[14]  Carla E. Brodley,et al.  Pruning Decision Trees with Misclassification Costs , 1998, ECML.

[15]  Xin Yao,et al.  Cost-sensitive classification with genetic programming , 2005, 2005 IEEE Congress on Evolutionary Computation.

[16]  Bianca Zadrozny,et al.  Transforming classifier scores into accurate multiclass probability estimates , 2002, KDD.

[17]  Ian H. Witten,et al.  Data mining: practical machine learning tools and techniques, 3rd Edition , 1999 .

[18]  P. Bartlett,et al.  Probabilities for SV Machines , 2000 .

[19]  Thomas G. Dietterich,et al.  Pruning Improves Heuristic Search for Cost-Sensitive Learning , 2002, ICML.

[20]  Thomas G. Dietterich,et al.  Methods for cost-sensitive learning , 2002 .

[21]  Pedro M. Domingos MetaCost: a general method for making classifiers cost-sensitive , 1999, KDD '99.

[22]  Dan Roth,et al.  Learning cost-sensitive active classifiers , 2002, Artif. Intell..

[23]  Bianca Zadrozny,et al.  Learning and making decisions when costs and probabilities are both unknown , 2001, KDD '01.

[24]  Brian C. Lovell,et al.  Cost-Sensitive Decision Tree Pruning: Use of the ROC Curve , 1995 .

[25]  Ron Kohavi,et al.  Wrappers for Feature Subset Selection , 1997, Artif. Intell..

[26]  John Mingers,et al.  Expert Systems—Rule Induction with Statistical Data , 1987 .

[27]  J. Ross Quinlan,et al.  C4.5: Programs for Machine Learning , 1992 .

[28]  Bianca Zadrozny One-Benefit learning: cost-sensitive learning with restricted cost information , 2005, UBDM '05.

[29]  Ian H. Witten,et al.  Data mining: practical machine learning tools and techniques with Java implementations , 2002, SGMD.