From Random Forest to an interpretable decision tree - An evolutionary approach

Random Forest (RF) is one of the most popular and effective machine learning algorithms. It is known for its superior predictive performance, versatility, and stability, among other things. However, an ensemble of decision trees (DTs) represents a black-box classifier. On the other hand, interpretability and explainability are ones of the top artificial intelligence trends, to make predictors more trustworthy and reliable. In this paper, we propose an evolutionary algorithm to extract a single DT that mimics the original RF model in terms of predictive power. The initial population is composed of trees from RF. During evolution, the genetic operators modify individuals (DTs) and exploit the initial (genetic) material. e.g., splits/tests in the tree nodes or more expanded parts of the DTs. The results show that the classification accuracy of a single DT predictor is not worse than that of the original RF. At the same time, and probably most importantly, the resulting classifier is a single smaller-size DT that is almost self-explainable.

[1]  Josué Obregon,et al.  RuleCOSI+: Rule extraction for interpreting classification tree ensembles , 2022, Inf. Fusion.

[2]  M. Kretowski,et al.  GPU-based acceleration of evolutionary induction of model trees , 2022, Appl. Soft Comput..

[3]  Christian Omlin,et al.  Explainable AI (XAI): A Systematic Meta-Survey of Current Challenges and Future Opportunities , 2021, Knowl. Based Syst..

[4]  E. Mezura-Montes,et al.  Induction of decision trees as classification models through metaheuristics , 2021, Swarm Evol. Comput..

[5]  Lior Rokach,et al.  Approximating XGBoost with an interpretable decision tree , 2021, Inf. Sci..

[6]  Lior Rokach,et al.  Explainable decision forest: Transforming a decision forest into an interpretable tree , 2020, Inf. Fusion.

[7]  Marek Kretowski,et al.  Evolutionary Decision Trees in Large-Scale Data Mining , 2020, Studies in Big Data.

[8]  Marek Kretowski,et al.  Evolutionary induction of a decision tree for large-scale data: a GPU-based approach , 2017, Soft Comput..

[9]  Wei-Yin Loh,et al.  Fifty Years of Classification and Regression Trees , 2014 .

[10]  P. K. Sinha,et al.  Pruning of Random Forest classifiers: A survey and future directions , 2012, 2012 International Conference on Data Science & Engineering (ICDSE).

[11]  L. Breiman Random Forests , 2001, Encyclopedia of Machine Learning and Data Mining.

[12]  Zbigniew Michalewicz,et al.  Genetic Algorithms + Data Structures = Evolution Programs , 1996, Springer Berlin Heidelberg.