Rule Covering for Interpretation and Boosting

We propose two algorithms for interpretation and boosting of tree-based ensemble methods. Both algorithms make use of mathematical programming models that are constructed with a set of rules extracted from an ensemble of decision trees. The objective is to obtain the minimum total impurity with the least number of rules that cover all the samples. The first algorithm uses the collection of decision trees obtained from a trained random forest model. Our numerical results show that the proposed rule covering approach selects only a few rules that could be used for interpreting the random forest model. Moreover, the resulting set of rules closely matches the accuracy level of the random forest model. Inspired by the column generation algorithm in linear programming, our second algorithm uses a rule generation scheme for boosting decision trees. We use the dual optimal solutions of the linear programming models as sample weights to obtain only those rules that would improve the accuracy. With a computational study, we observe that our second algorithm performs competitively with the other well-known boosting methods. Our implementations also demonstrate that both algorithms can be trivially coupled with the existing random forest and decision tree packages.

[1]  Bogdan E. Popescu,et al.  PREDICTIVE LEARNING VIA RULE ENSEMBLES , 2008, 0811.1679.

[2]  Dimitris Bertsimas,et al.  Optimal classification trees , 2017, Machine Learning.

[3]  Oktay Günlük,et al.  Optimal Generalized Decision Trees via Integer Programming , 2016, ArXiv.

[4]  Robin Gras,et al.  Rule Extraction from Decision Trees Ensembles: New Algorithms Based on Heuristic Search and Sparse Group Lasso Methods , 2017, Int. J. Inf. Technol. Decis. Mak..

[5]  J. Friedman Greedy function approximation: A gradient boosting machine. , 2001 .

[6]  Yoav Freund,et al.  A decision-theoretic generalization of on-line learning and an application to boosting , 1995, EuroCOLT.

[7]  Yun Lu,et al.  What is the best greedy-like heuristic for the weighted set covering problem? , 2016, Oper. Res. Lett..

[8]  Oktay Günlük,et al.  Optimal decision trees for categorical data via integer programming , 2021, Journal of Global Optimization.

[9]  Sebastian Pokutta,et al.  IPBoost - Non-Convex Boosting via Integer Programming , 2020, ICML.

[10]  N. Meinshausen Node harvest: simple and interpretable regression and classication , 2009, 0910.2145.

[11]  Peter L. Bartlett,et al.  Improved Generalization Through Explicit Optimization of Margins , 2000, Machine Learning.

[12]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[13]  Yoav Freund,et al.  Boosting the margin: A new explanation for the effectiveness of voting methods , 1997, ICML.

[14]  Robin Gras,et al.  Rule Extraction from Random Forest: the RF+HC Methods , 2015, Canadian Conference on AI.

[15]  Ayhan Demiriz,et al.  Linear Programming Boosting via Column Generation , 2002, Machine Learning.

[16]  Sanjeeb Dash,et al.  Boolean Decision Rules via Column Generation , 2018, NeurIPS.

[17]  Yingqian Zhang,et al.  Column generation based heuristic for learning classification trees , 2020, Comput. Oper. Res..

[18]  Vasek Chvátal,et al.  A Greedy Heuristic for the Set-Covering Problem , 1979, Math. Oper. Res..

[19]  Peter L. Bartlett,et al.  Boosting Algorithms as Gradient Descent , 1999, NIPS.