Binary Partitions with Approximate Minimum Impurity

The problem of splitting attributes is one of the main steps in the construction of decision trees. In order to decide the best split, impurity measures such as Entropy and Gini are widely used. In practice, decision-tree inducers use heuristics for finding splits with small impurity when they consider nominal attributes with a large number of distinct values. However, there are no known guarantees for the quality of the splits obtained by these heuristics. To fill this gap, we propose two new splitting procedures that provably achieve nearoptimal impurity. We also report experiments that provide evidence that the proposed methods are interesting candidates to be employed in splitting nominal attributes with many values during decision tree/random forest induction.

[1]  Sebastian Nowozin,et al.  Improved Information Gain Estimates for Decision Tree Induction , 2012, ICML.

[2]  Ido Tal,et al.  Channel Upgradation for Non-Binary Input Alphabets and MACs , 2017, IEEE Trans. Inf. Theory.

[3]  Aiko M. Hormann,et al.  Programs for Machine Learning. Part I , 1962, Inf. Control..

[4]  Bobak Nazer,et al.  Information-distilling quantizers , 2017, 2017 IEEE International Symposium on Information Theory (ISIT).

[5]  Jonathan R. M. Hosking,et al.  Partitioning Nominal Attributes in Decision Trees , 1999, Data Mining and Knowledge Discovery.

[6]  Jorma Rissanen,et al.  SLIQ: A Fast Scalable Classifier for Data Mining , 1996, EDBT.

[7]  Alexander Vardy,et al.  How to Construct Polar Codes , 2011, IEEE Transactions on Information Theory.

[8]  Eduardo Sany Laber,et al.  Splitting criteria for classification problems with multi-valued attributes and large number of classes , 2018, Pattern Recognit. Lett..

[9]  V. D. Pietra,et al.  Minimum Impurity Partitions , 1992 .

[10]  Brian M. Kurkoski,et al.  Quantization of Binary-Input Discrete Memoryless Channels , 2011, IEEE Transactions on Information Theory.

[11]  Leo Breiman,et al.  Classification and Regression Trees , 1984 .

[12]  Jirí Navrátil,et al.  Recent advances in phonotactic language recognition using binary-decision trees , 2006, INTERSPEECH.

[13]  Tapio Elomaa,et al.  Efficient Multisplitting Revisited: Optima-Preserving Elimination of Partition Candidates , 2004, Data Mining and Knowledge Discovery.

[14]  Alexander Barg,et al.  Construction of polar codes for arbitrary discrete memoryless channels , 2016, 2016 IEEE International Symposium on Information Theory (ISIT).

[15]  A. Nadas,et al.  An iterative 'flip-flop' approximation of the most informative split in the construction of decision trees , 1991, [Proceedings] ICASSP 91: 1991 International Conference on Acoustics, Speech, and Signal Processing.

[16]  Ido Tal,et al.  Greedy-merge degrading has optimal power-law , 2017, 2017 IEEE International Symposium on Information Theory (ISIT).

[17]  Philip A. Chou,et al.  Optimal Partitioning for Classification and Regression Trees , 1991, IEEE Trans. Pattern Anal. Mach. Intell..