Inter-node Hellinger Distance based Decision Tree

This paper introduces a new splitting criterion called Inter-node Hellinger Distance (iHD) and a weighted version of it (iHDw) for constructing decision trees. iHD measures the distance between the parent and each of the child nodes in a split using Hellinger distance. We prove that this ensures the mutual exclusiveness between the child nodes. The weight term in iHDw is concerned with the purity of individual child node considering the class imbalance problem. The combination of the distance and weight term in iHDw thus favors a partition where child nodes are purer and mutually exclusive, and skew insensitive. We perform an experiment over twenty balanced and twenty imbalanced datasets. The results show that decision trees based on iHD win against six other state-of-the-art methods on at least 14 balanced and 10 imbalanced datasets. We also observe that adding the weight to iHD improves the performance of decision trees on imbalanced datasets. Moreover, according to the result of the Friedman test, this improvement is statistically significant compared to other methods.

[1]  Srinivasan Parthasarathy,et al.  Proceedings of the 2010 SIAM International Conference on Data Mining , 2010 .

[2]  J. Ross Quinlan,et al.  Induction of Decision Trees , 1986, Machine Learning.

[3]  K. Roeder,et al.  Comment , 2006 .

[4]  Nitesh V. Chawla,et al.  Building Decision Trees for the Multi-class Imbalance Problem , 2012, PAKDD.

[5]  R. Lathe Phd by thesis , 1988, Nature.

[6]  Robert C. Holte,et al.  Exploiting the Cost (In)sensitivity of Decision Tree Splitting Criteria , 2000, ICML.

[7]  M. Friedman The Use of Ranks to Avoid the Assumption of Normality Implicit in the Analysis of Variance , 1937 .

[8]  B. Ripley,et al.  Pattern Recognition , 1968, Nature.

[9]  Thomas G. Dietterich What is machine learning? , 2020, Archives of Disease in Childhood.

[10]  R. Iman,et al.  Approximations of the critical region of the fbietkan statistic , 1980 .

[11]  John Eccleston,et al.  Statistics and Computing , 2006 .

[12]  M. Friedman A Comparison of Alternative Tests of Significance for the Problem of $m$ Rankings , 1940 .

[13]  Y.-S. Shih,et al.  Families of splitting criteria for classification trees , 1999, Stat. Comput..

[14]  Shuliang Wang,et al.  Data Mining and Knowledge Discovery , 2005, Mathematical Principles of the Internet.

[15]  David A. Cieslak,et al.  A Robust Decision Tree Algorithm for Imbalanced Data Sets , 2010, SDM.

[16]  Oksam Chae,et al.  Simultaneous feature selection and discretization based on mutual information , 2019, Pattern Recognit..

[17]  David A. Cieslak,et al.  Hellinger distance decision trees are robust and skew-insensitive , 2011, Data Mining and Knowledge Discovery.

[18]  Ravi Kothari,et al.  A new node splitting measure for decision tree construction , 2010, Pattern Recognit..

[19]  David J. Hand,et al.  A Simple Generalisation of the Area Under the ROC Curve for Multiple Class Classification Problems , 2001, Machine Learning.

[20]  David A. Cieslak,et al.  Learning Decision Trees for Unbalanced Data , 2008, ECML/PKDD.

[21]  M. Abdullah-Al-Wadud,et al.  Directional Age-Primitive Pattern (DAPP) for Human Age Group Recognition and Age Estimation , 2017, IEEE Transactions on Information Forensics and Security.

[22]  Janez Demsar,et al.  Statistical Comparisons of Classifiers over Multiple Data Sets , 2006, J. Mach. Learn. Res..

[23]  Sotiris B. Kotsiantis,et al.  Supervised Machine Learning: A Review of Classification Techniques , 2007, Informatica.

[24]  Wei-Yin Loh,et al.  Classification and regression trees , 2011, WIREs Data Mining Knowl. Discov..

[25]  Ana-Maria Mirea,et al.  Building Decision Trees , 2009 .

[26]  B. Silverman,et al.  Block diagrams and splitting criteria for classification trees , 1993 .

[27]  Jie Cao,et al.  Improving lazy decision tree for imbalanced classification by using skew-insensitive criteria , 2018, Applied Intelligence.

[28]  Jesús Alcalá-Fdez,et al.  KEEL Data-Mining Software Tool: Data Set Repository, Integration of Algorithms and Experimental Analysis Framework , 2011, J. Multiple Valued Log. Soft Comput..

[29]  Andrzej Cichocki,et al.  Families of Alpha- Beta- and Gamma- Divergences: Flexible and Robust Measures of Similarities , 2010, Entropy.

[30]  N. Balakrishna,et al.  Communications in Statistics-Theory and Methods , 2012 .