Building more accurate decision trees with the additive tree

Significance As machine learning applications expand to high-stakes areas such as criminal justice, finance, and medicine, legitimate concerns emerge about high-impact effects of individual mispredictions on people’s lives. As a result, there has been increasing interest in understanding general machine learning models to overcome possible serious risks. Current decision trees, such as Classification and Regression Trees (CART), have played a predominant role in fields such as medicine, due to their simplicity and intuitive interpretation. However, such trees suffer from intrinsic limitations in predictive power. We developed the additive tree, a theoretical approach to generate a more accurate and interpretable decision tree, which reveals connections between CART and gradient boosting. The additive tree exhibits superior predictive performance to CART, as validated on 83 classification tasks. The expansion of machine learning to high-stakes application domains such as medicine, finance, and criminal justice, where making informed decisions requires clear understanding of the model, has increased the interest in interpretable machine learning. The widely used Classification and Regression Trees (CART) have played a major role in health sciences, due to their simple and intuitive explanation of predictions. Ensemble methods like gradient boosting can improve the accuracy of decision trees, but at the expense of the interpretability of the generated model. Additive models, such as those produced by gradient boosting, and full interaction models, such as CART, have been investigated largely in isolation. We show that these models exist along a spectrum, revealing previously unseen connections between these approaches. This paper introduces a rigorous formalization for the additive tree, an empirically validated learning technique for creating a single decision tree, and shows that this method can produce models equivalent to CART or gradient boosted stumps at the extremes by varying a single parameter. Although the additive tree is designed primarily to provide both the model interpretability and predictive performance needed for high-stakes applications like medicine, it also can produce decision trees represented by hybrid models between CART and boosted stumps that can outperform either of these approaches.

[1]  Marcello Pagano,et al.  Principles of Biostatistics , 1992 .

[2]  J. Ross Quinlan,et al.  C4.5: Programs for Machine Learning , 1992 .

[3]  Trevor Darrell,et al.  Generating Visual Explanations , 2016, ECCV.

[4]  Yoav Freund,et al.  A decision-theoretic generalization of on-line learning and an application to boosting , 1997, EuroCOLT.

[5]  Daniel G. Goldstein,et al.  Manipulating and Measuring Model Interpretability , 2018, CHI.

[6]  Anderson Ara,et al.  Classification methods applied to credit scoring: A systematic review and overall comparison , 2016, 1602.02137.

[7]  Joachim M. Buhmann,et al.  The Balanced Accuracy and Its Posterior Distribution , 2010, 2010 20th International Conference on Pattern Recognition.

[8]  J. Friedman Special Invited Paper-Additive logistic regression: A statistical view of boosting , 2000 .

[9]  Leonhard Held,et al.  Spatio-Temporal Analysis of Epidemic Phenomena Using the R Package surveillance , 2014, ArXiv.

[10]  Matthias Reif A Comprehensive Dataset for Evaluating Approaches of Various Meta-learning Tasks , 2012, ICPRAM.

[11]  Vipin Kumar,et al.  Mining Electronic Health Records: A Survey , 2017, 1702.03222.

[12]  Arvind Narayanan,et al.  Semantics derived automatically from language corpora contain human-like biases , 2016, Science.

[13]  M. Shaw,et al.  Induction of fuzzy decision trees , 1995 .

[14]  Avi Feller,et al.  Algorithmic Decision Making and the Cost of Fairness , 2017, KDD.

[15]  Christopher T. Lowenkamp,et al.  False Positives, False Negatives, and False Analyses: A Rejoinder to "Machine Bias: There's Software Used across the Country to Predict Future Criminals. and It's Biased against Blacks" , 2016 .

[16]  Randal S. Olson,et al.  PMLB: a large benchmark suite for machine learning evaluation and comparison , 2017, BioData Mining.

[17]  J. Friedman Greedy function approximation: A gradient boosting machine. , 2001 .

[18]  Timothy D. Solberg,et al.  Tree-Structured Boosting: Connections Between Gradient Boosted Stumps and Full Decision Trees , 2017, ArXiv.

[19]  Etienne Grossmann AdaTree: Boosting a Weak Classifier into a Decision Tree , 2004, 2004 Conference on Computer Vision and Pattern Recognition Workshop.

[20]  Finale Doshi-Velez,et al.  Mind the Gap: A Generative Approach to Interpretable Feature Selection and Extraction , 2015, NIPS.

[21]  Efstathios Dimitrios Gennatas Towards Precision Psychiatry: Gray Matter Development and Cognition in Adolescence , 2017 .

[22]  Been Kim,et al.  Towards A Rigorous Science of Interpretable Machine Learning , 2017, 1702.08608.

[23]  F. Cabitza,et al.  Machine learning in laboratory medicine: waiting for the flood? , 2017, Clinical chemistry and laboratory medicine.

[24]  Seth Flaxman,et al.  European Union Regulations on Algorithmic Decision-Making and a "Right to Explanation" , 2016, AI Mag..

[25]  D. Sculley,et al.  Hidden Technical Debt in Machine Learning Systems , 2015, NIPS.

[26]  A. F. Adams,et al.  The Survey , 2021, Dyslexia in Higher Education.

[27]  Rich Caruana,et al.  An empirical comparison of supervised learning algorithms , 2006, ICML.

[28]  Carlos Guestrin,et al.  "Why Should I Trust You?": Explaining the Predictions of Any Classifier , 2016, ArXiv.

[29]  Mohammed A.-M. Salem,et al.  Recent Survey on Medical Image Segmentation , 2017 .

[30]  Zachary C. Lipton,et al.  The Doctor Just Won't Accept That! , 2017, 1711.08037.

[31]  Jesús Alcalá-Fdez,et al.  KEEL Data-Mining Software Tool: Data Set Repository, Integration of Algorithms and Experimental Analysis Framework , 2011, J. Multiple Valued Log. Soft Comput..

[32]  L. Ungar,et al.  MediBoost: a Patient Stratification Tool for Interpretable Decision Making in the Era of Precision Medicine , 2016, Scientific Reports.

[33]  Johannes Gehrke,et al.  Intelligible Models for HealthCare: Predicting Pneumonia Risk and Hospital 30-day Readmission , 2015, KDD.

[34]  V Kishore Ayyadevara,et al.  Gradient Boosting Machine , 2018 .

[35]  Giorgio Valentini,et al.  Low Bias Bagged Support Vector Machines , 2003, ICML.

[36]  Wei-Yin Loh,et al.  Classification and regression trees , 2011, WIREs Data Mining Knowl. Discov..

[37]  Zhuowen Tu,et al.  Probabilistic boosting-tree: learning discriminative models for classification, recognition, and clustering , 2005, Tenth IEEE International Conference on Computer Vision (ICCV'05) Volume 1.

[38]  Wei-Yin Loh,et al.  Fifty Years of Classification and Regression Trees , 2014 .

[39]  W. Loh,et al.  Improving the precision of classification trees , 2010, 1011.0608.

[40]  Zachary Chase Lipton The mythos of model interpretability , 2016, ACM Queue.

[41]  Yishay Mansour,et al.  On the Boosting Ability of Top-Down Decision Tree Learning Algorithms , 1999, J. Comput. Syst. Sci..

[42]  R Core Team,et al.  R: A language and environment for statistical computing. , 2014 .

[43]  David A. Landgrebe,et al.  A survey of decision tree classifier methodology , 1991, IEEE Trans. Syst. Man Cybern..

[44]  Timothy D. Solberg,et al.  Predicting radiation pneumonitis in locally advanced stage II-III non-small cell lung cancer using machine learning. , 2019, Radiotherapy and oncology : journal of the European Society for Therapeutic Radiology and Oncology.

[45]  S. Athey,et al.  Generalized random forests , 2016, The Annals of Statistics.

[46]  Andreas Ziegler,et al.  ranger: A Fast Implementation of Random Forests for High Dimensional Data in C++ and R , 2015, 1508.04409.

[47]  W. Loh,et al.  REGRESSION TREES WITH UNBIASED VARIABLE SELECTION AND INTERACTION DETECTION , 2002 .

[48]  Wil M. P. van der Aalst,et al.  Business Process Variability Modeling , 2017, ACM Comput. Surv..