Unifying the Split Criteria of Decision Trees Using Tsallis Entropy

The construction of efficient and effective decision trees remains a key topic in machine learning because of their simplicity and flexibility. A lot of heuristic algorithms have been proposed to construct near-optimal decision trees. ID3, C4.5 and CART are classical decision tree algorithms and the split criteria they used are Shannon entropy, Gain Ratio and Gini index respectively. All the split criteria seem to be independent, actually, they can be unified in a Tsallis entropy framework. Tsallis entropy is a generalization of Shannon entropy and provides a new approach to enhance decision trees’ performance with an adjustable parameter q. In this paper, a Tsallis Entropy Criterion (TEC) algorithm is proposed to unify Shannon entropy, Gain Ratio and Gini index, which generalizes the split criteria of decision trees. More importantly, we reveal the relations between Tsallis entropy with different q and other split criteria. Experimental results on UCI data sets indicate that the TEC algorithm achieves statistically significant improvement over the classical algorithms.

[1]  Wlodzislaw Duch,et al.  Comparison of Shannon, Renyi and Tsallis Entropy Used in Decision Trees , 2006, ICAISC.

[2]  T. Yamano Information theory based on nonadditive information content. , 2000, Physical review. E, Statistical, nonlinear, and soft matter physics.

[3]  Constantino Tsallis Generalizing What We Learnt: Nonextensive Statistical Mechanics , 2009 .

[4]  Gaël Varoquaux,et al.  Scikit-learn: Machine Learning in Python , 2011, J. Mach. Learn. Res..

[5]  J. Ross Quinlan,et al.  Induction of Decision Trees , 1986, Machine Learning.

[6]  A. Asuncion,et al.  UCI Machine Learning Repository, University of California, Irvine, School of Information and Computer Sciences , 2007 .

[7]  Wray L. Buntine,et al.  A Further Comparison of Splitting Rules for Decision-Tree Induction , 1992, Machine Learning.

[8]  Wei-Yin Loh,et al.  Classification and regression trees , 2011, WIREs Data Mining Knowl. Discov..

[9]  Funabashi,et al.  Nonadditive conditional entropy and its significance for local realism , 2000, quant-ph/0001085.

[10]  C. Tsallis Possible generalization of Boltzmann-Gibbs statistics , 1988 .

[11]  Janez Demsar,et al.  Statistical Comparisons of Classifiers over Multiple Data Sets , 2006, J. Mach. Learn. Res..

[12]  Sang Joon Kim,et al.  A Mathematical Theory of Communication , 2006 .

[13]  Sebastian Nowozin,et al.  Improved Information Gain Estimates for Decision Tree Induction , 2012, ICML.

[14]  Constantino Tsallis,et al.  Introduction to Nonextensive Statistical Mechanics and Thermodynamics , 2003 .

[15]  Allan P. White,et al.  The importance of attribute selection measures in decision tree induction , 2005, Machine Learning.

[16]  Roman Frigg,et al.  Entropy: a guide for the perplexed , 2011 .

[17]  Ian H. Witten,et al.  The WEKA data mining software: an update , 2009, SKDD.

[18]  Mathieu Serrurier,et al.  Entropy evaluation based on confidence intervals of frequency estimates : Application to the learning of decision trees , 2015, ICML.