A new metric splitting criterion for decision trees

We examine a new approach to building decision tree by introducing a geometric splitting criterion, based on the properties of a family of metrics on the space of partitions of a finite set. This criterion can be adapted to the characteristics of the data sets and the needs of the users and yields decision trees that have smaller sizes and fewer leaves than the trees built with standard methods and have comparable or better accuracy.

[1]  Bernard Monjardet,et al.  Metrics on partially ordered sets - A survey , 1981, Discret. Math..

[2]  Ian H. Witten,et al.  Data mining: practical machine learning tools and techniques, 3rd Edition , 1999 .

[3]  Szymon Jaroszewicz,et al.  An axiomatization of partition entropy , 2002, IEEE Trans. Inf. Theory.

[4]  Ian Witten,et al.  Data Mining , 2000 .

[5]  R. P. Dilworth Review: G. Birkhoff, Lattice theory , 1950 .

[6]  Ramón López de Mántaras,et al.  A distance-based attribute selection measure for decision tree induction , 1991, Machine Learning.

[7]  J. Barthélemy,et al.  Remarques sur les propriétés métriques des ensembles ordonnés , 1978 .

[8]  Anne Lohrli Chapman and Hall , 1985 .

[9]  Zoltán Daróczy,et al.  Generalized Information Functions , 1970, Inf. Control..

[10]  Vipin Kumar,et al.  Introduction to Data Mining , 2022, Data Mining and Machine Learning Applications.

[11]  S. Jaroszewicz,et al.  Generalized Entropy and Decision Trees , 2002 .

[12]  Dan A. Simovici,et al.  Metric incremental clustering of nominal data , 2004, Fourth IEEE International Conference on Data Mining (ICDM'04).

[13]  Thomas G. Dietterich What is machine learning? , 2020, Archives of Disease in Childhood.

[14]  D. J. Newman,et al.  UCI Repository of Machine Learning Database , 1998 .

[15]  Leo Breiman,et al.  Classification and Regression Trees , 1984 .

[16]  Jean-Pierre Barthélemy,et al.  The Median Procedure for Partitions , 1993, Partitioning Data Sets.