Improving Decision Trees Using Tsallis Entropy

The construction of efficient and effective decision trees remains a key topic in machine learning because of their simplicity and flexibility. A lot of heuristic algorithms have been proposed to construct near-optimal decision trees. Most of them, however, are greedy algorithms which have the drawback of obtaining only local optimums. Besides, common split criteria, e.g. Shannon entropy, Gain Ratio and Gini index, are also not flexible due to lack of adjustable parameters on data sets. To address the above issues, we propose a series of novel methods using Tsallis entropy in this paper. Firstly, a Tsallis Entropy Criterion (TEC) algorithm is proposed to unify Shannon entropy, Gain Ratio and Gini index, which generalizes the split criteria of decision trees. Secondly, we propose a Tsallis Entropy Information Metric (TEIM) algorithm for efficient construction of decision trees. The TEIM algorithm takes advantages of the adaptability of Tsallis conditional entropy and the reducing greediness ability of two-stage approach. Experimental results on UCI data sets indicate that the TEC algorithm achieves statistically significant improvement over the classical algorithms, and that the TEIM algorithm yields significantly better decision trees in both classification accuracy and tree complexity.

[1]  Oded Kafri A Comment on Nonextensive Statistical Mechanics , 2009, ArXiv.

[2]  Gaël Varoquaux,et al.  Scikit-learn: Machine Learning in Python , 2011, J. Mach. Learn. Res..

[3]  Irad Ben-Gal,et al.  Efficient Construction of Decision Trees by the Dual Information Distance Method , 2014 .

[4]  Petra Perner,et al.  Decision Tree Induction Methods and Their Application to Big Data , 2015 .

[5]  Ian H. Witten,et al.  The WEKA data mining software: an update , 2009, SKDD.

[6]  J. Ross Quinlan,et al.  Induction of Decision Trees , 1986, Machine Learning.

[7]  Janez Demsar,et al.  Statistical Comparisons of Classifiers over Multiple Data Sets , 2006, J. Mach. Learn. Res..

[8]  Wei-Yin Loh,et al.  Classification and regression trees , 2011, WIREs Data Mining Knowl. Discov..

[9]  Sreerama K. Murthy,et al.  Automatic Construction of Decision Trees from Data: A Multi-Disciplinary Survey , 1998, Data Mining and Knowledge Discovery.

[10]  C. Tsallis Introduction to Nonextensive Statistical Mechanics: Approaching a Complex World , 2009 .

[11]  Mathieu Serrurier,et al.  Entropy evaluation based on confidence intervals of frequency estimates : Application to the learning of decision trees , 2015, ICML.

[12]  Ronald L. Rivest,et al.  Constructing Optimal Binary Decision Trees is NP-Complete , 1976, Inf. Process. Lett..

[13]  Roman Frigg,et al.  Entropy: a guide for the perplexed , 2011 .

[14]  Funabashi,et al.  Nonadditive conditional entropy and its significance for local realism , 2000, quant-ph/0001085.

[15]  Alex Alves Freitas,et al.  A Survey of Evolutionary Algorithms for Decision-Tree Induction , 2012, IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews).

[16]  Sebastian Nowozin,et al.  Improved Information Gain Estimates for Decision Tree Induction , 2012, ICML.

[17]  T. Yamano Information theory based on nonadditive information content. , 2000, Physical review. E, Statistical, nonlinear, and soft matter physics.

[18]  S. Furuichi Information theoretical properties of Tsallis entropies , 2004, cond-mat/0405600.

[19]  J. Ross Quinlan,et al.  C4.5: Programs for Machine Learning , 1992 .

[20]  Wray L. Buntine,et al.  A further comparison of splitting rules for decision-tree induction , 2004, Machine Learning.

[21]  Sang Joon Kim,et al.  A Mathematical Theory of Communication , 2006 .

[22]  Shaul Markovitch,et al.  Lookahead-based algorithms for anytime induction of decision trees , 2004, ICML.

[23]  C. Tsallis Possible generalization of Boltzmann-Gibbs statistics , 1988 .

[24]  Wlodzislaw Duch,et al.  Comparison of Shannon, Renyi and Tsallis Entropy Used in Decision Trees , 2006, ICAISC.

[25]  Wei Zhong Liu,et al.  The Importance of Attribute Selection Measures in Decision Tree Induction , 1994, Machine Learning.

[26]  Constantino Tsallis Generalizing What We Learnt: Nonextensive Statistical Mechanics , 2009 .