Improving decision tree performance by exception handling

This paper focuses on improving decision tree induction algorithms when a kind of tie appears during the rule generation procedure for specific training datasets. The tie occurs when there are equal proportions of the target class outcome in the leaf node’s records that leads to a situation where majority voting cannot be applied. To solve the above mentioned exception, we propose to base the prediction of the result on the naive Bayes (NB) estimate, k-nearest neighbour (k-NN) and association rule mining (ARM). The other features used for splitting the parent nodes are also taken into consideration.

[1]  Paul E. Utgoff,et al.  Incremental Induction of Decision Trees , 1989, Machine Learning.

[2]  Xuesong Yan,et al.  Survey of Improving Naive Bayes for Classification , 2007, ADMA.

[3]  Kay Chen Tan,et al.  CAutoCSD-evolutionary search and optimisation enabled computer automated control system design , 2004, Int. J. Autom. Comput..

[4]  Ian H. Witten,et al.  Data mining: practical machine learning tools and techniques, 3rd Edition , 1999 .

[5]  Wei-Yin Loh,et al.  Classification and regression trees , 2011, WIREs Data Mining Knowl. Discov..

[6]  Ling Li,et al.  Improving the Performance of Decision Tree: A Hybrid Approach , 2004, ER.

[7]  Paul E. Utgoff,et al.  ID5: An Incremental ID3 , 1987, ML.

[8]  Pat Langley,et al.  An Analysis of Bayesian Classifiers , 1992, AAAI.

[9]  David W. Aha,et al.  Instance-Based Learning Algorithms , 1991, Machine Learning.

[10]  Paul E. Utgoff,et al.  Decision Tree Induction Based on Efficient Tree Restructuring , 1997, Machine Learning.

[11]  Douglas H. Fisher,et al.  A Case Study of Incremental Concept Induction , 1986, AAAI.

[12]  Ron Kohavi,et al.  Option Decision Trees with Majority Votes , 1997, ICML.

[13]  Zhi-Hua Zhou,et al.  Hybrid decision tree , 2002, Knowl. Based Syst..

[14]  Paul E. Utgoff,et al.  Improved Training Via Incremental Learning , 1989, ML.

[15]  Sholom M. Weiss,et al.  Small Sample Error Rate Estimation for k-NN Classifiers , 1991, IEEE Trans. Pattern Anal. Mach. Intell..

[16]  Leo Breiman,et al.  Classification and Regression Trees , 1984 .

[17]  Ron Kohavi,et al.  Scaling Up the Accuracy of Naive-Bayes Classifiers: A Decision-Tree Hybrid , 1996, KDD.

[18]  Richard Granger,et al.  Beyond Incremental Processing: Tracking Concept Drift , 1986, AAAI.

[19]  Paul E. Utgoff,et al.  An Improved Algorithm for Incremental Induction of Decision Trees , 1994, ICML.

[20]  Peter E. Hart,et al.  Nearest neighbor pattern classification , 1967, IEEE Trans. Inf. Theory.

[21]  Yasuhiko Morimoto,et al.  Data Mining with optimized two-dimensional association rules , 2001, TODS.

[22]  Jeffrey S. Chase,et al.  Proceedings of the 2nd USENIX workshop on Tackling computer systems problems with machine learning techniques , 2007 .

[23]  Martin Szummer,et al.  Snitch: interactive decision trees for troubleshooting misconfigurations , 2007 .

[24]  Christopher J. Merz,et al.  UCI Repository of Machine Learning Databases , 1996 .

[25]  Bernhard Pfahringer,et al.  Locally Weighted Naive Bayes , 2002, UAI.

[26]  J. Ross Quinlan,et al.  Simplifying Decision Trees , 1987, Int. J. Man Mach. Stud..

[27]  Pramod K. Varshney,et al.  Application of information theory to the construction of efficient decision trees , 1982, IEEE Trans. Inf. Theory.

[28]  Wray L. Buntine,et al.  Learning classification trees , 1992 .

[29]  Ron Kohavi,et al.  A Study of Cross-Validation and Bootstrap for Accuracy Estimation and Model Selection , 1995, IJCAI.

[30]  R. Rajaram,et al.  Effective and efficient feature selection for large-scale data using Bayes’ theorem , 2009, Int. J. Autom. Comput..

[31]  J. Ross Quinlan,et al.  Induction of Decision Trees , 1986, Machine Learning.

[32]  H. Chipman,et al.  Bayesian CART Model Search , 1998 .

[33]  Padhraic Smyth,et al.  From Data Mining to Knowledge Discovery: An Overview , 1996, Advances in Knowledge Discovery and Data Mining.

[34]  Jiawei Han,et al.  Data Mining: Concepts and Techniques , 2000 .

[35]  Rich Caruana,et al.  An empirical comparison of supervised learning algorithms , 2006, ICML.