How Interpretable and Trustworthy are GAMs?

Generalized additive models (GAMs) have become a leading model class for interpretable machine learning. However, there are many algorithms for training GAMs, and these can learn different or even contradictory models, while being equally accurate. Which GAM should we trust? In this paper, we quantitatively and qualitatively investigate a variety of GAM algorithms on real and simulated datasets. We find that GAMs with high feature sparsity (only using a few variables to make predictions) can miss patterns in the data and be unfair to rare subpopulations. Our results suggest that inductive bias plays a crucial role in what interpretable models learn and that tree-based GAMs represent the best balance of sparsity, fidelity and accuracy and thus appear to be the most trustworthy GAM models.

[1]  Eric J. Pedersen,et al.  Hierarchical generalized additive models in ecology: an introduction with mgcv , 2019, PeerJ.

[2]  Claudio Moser,et al.  Applying generalized additive models to unravel dynamic changes in anthocyanin biosynthesis in methyl jasmonate elicited grapevine (Vitis vinifera cv. Gamay) cell cultures , 2017, Horticulture Research.

[3]  Johannes Gehrke,et al.  Intelligible Models for HealthCare: Predicting Pneumonia Risk and Hospital 30-day Readmission , 2015, KDD.

[4]  Ashley Petersen,et al.  Fused Lasso Additive Model , 2014, Journal of computational and graphical statistics : a joint publication of American Statistical Association, Institute of Mathematical Statistics, Interface Foundation of North America.

[5]  Rich Caruana,et al.  InterpretML: A Unified Framework for Machine Learning Interpretability , 2019, ArXiv.

[6]  Alexandra Chouldechova,et al.  Fair prediction with disparate impact: A study of bias in recidivism prediction instruments , 2016, Big Data.

[7]  Ron Kohavi,et al.  Scaling Up the Accuracy of Naive-Bayes Classifiers: A Decision-Tree Hybrid , 1996, KDD.

[8]  Albert Gordo,et al.  Learning Global Additive Explanations for Neural Nets Using Model Distillation , 2018 .

[9]  Achim Zeileis,et al.  Bias in random forest variable importance measures: Illustrations, sources and a solution , 2007, BMC Bioinformatics.

[10]  P. Tseng,et al.  AMlet, RAMlet, and GAMlet: Automatic Nonlinear Fitting of Additive Models, Robust and Generalized, With Wavelets , 2004 .

[11]  Gerhard Tutz,et al.  A comparison of methods for the fitting of generalized additive models , 2008, Stat. Comput..

[12]  Rich Caruana,et al.  Distill-and-Compare: Auditing Black-Box Models Using Transparent Model Distillation , 2017, AIES.

[13]  Paul R Hunter,et al.  Have We Substantially Underestimated the Impact of Improved Sanitation Coverage on Child Health? A Generalized Additive Model Panel Analysis of Global Data on Child Mortality and Malnutrition , 2016, PloS one.

[14]  Farzali Izadi,et al.  Generalized Additive Models to Capture the Death Rates in Canada COVID-19 , 2020, Mathematics of Public Health.

[15]  Grace Wahba,et al.  Spline Models for Observational Data , 1990 .

[16]  Stefan Hegselmann,et al.  An Evaluation of the Doctor-Interpretability of Generalized Additive Models with Interactions , 2020, MLHC.

[17]  R. Tibshirani,et al.  Generalized additive models for medical research , 1995, Statistical methods in medical research.

[18]  Marvin N. Wright,et al.  Splitting on categorical predictors in random forests , 2019, PeerJ.

[19]  S. Wood Fast stable restricted maximum likelihood and marginal likelihood estimation of semiparametric generalized linear models , 2011 .

[20]  Galit Shmueli,et al.  To Explain or To Predict? , 2010, 1101.0891.

[21]  Sameer Singh,et al.  “Why Should I Trust You?”: Explaining the Predictions of Any Classifier , 2016, NAACL.

[22]  Tianqi Chen,et al.  XGBoost: A Scalable Tree Boosting System , 2016, KDD.

[23]  G. Wahba A Comparison of GCV and GML for Choosing the Smoothing Parameter in the Generalized Spline Smoothing Problem , 1985 .

[24]  R. Tibshirani,et al.  Sparsity and smoothness via the fused lasso , 2005 .

[25]  Zachary C. Lipton,et al.  The mythos of model interpretability , 2018, Commun. ACM.

[26]  Giles Hooker,et al.  Please Stop Permuting Features: An Explanation and Alternatives , 2019, ArXiv.

[27]  R. Tibshirani,et al.  Generalized Additive Models , 1986 .

[28]  Mina Amiri,et al.  The optimal cut-off point of vitamin D for pregnancy outcomes using a generalized additive model. , 2020, Clinical nutrition.

[29]  Been Kim,et al.  Towards A Rigorous Science of Interpretable Machine Learning , 2017, 1702.08608.

[30]  Toniann Pitassi,et al.  Learning Fair Representations , 2013, ICML.

[31]  R. Tibshirani Regression Shrinkage and Selection via the Lasso , 1996 .

[32]  Daniel Servén,et al.  pyGAM: Generalized Additive Models in Python , 2018 .

[33]  Rich Caruana,et al.  On Feature Selection, Bias-Variance, and Bagging , 2009, ECML/PKDD.

[34]  Eric Bauer,et al.  An Empirical Comparison of Voting Classification Algorithms: Bagging, Boosting, and Variants , 1999, Machine Learning.

[35]  Kristina Lerman,et al.  A Survey on Bias and Fairness in Machine Learning , 2019, ACM Comput. Surv..

[36]  Robert C. Holte,et al.  Decision Tree Instability and Active Learning , 2007, ECML.

[37]  Giles Hooker,et al.  Unbiased Measurement of Feature Importance in Tree-Based Methods , 2019, ACM Trans. Knowl. Discov. Data.

[38]  Johannes Gehrke,et al.  Intelligible models for classification and regression , 2012, KDD.

[39]  Sameer Singh,et al.  Fooling LIME and SHAP: Adversarial Attacks on Post hoc Explanation Methods , 2020, AIES.

[40]  Peter Szolovits,et al.  MIMIC-III, a freely accessible critical care database , 2016, Scientific Data.

[41]  S. Sapra,et al.  Generalized additive models in business and economics , 2013 .

[42]  Scott Lundberg,et al.  A Unified Approach to Interpreting Model Predictions , 2017, NIPS.

[43]  Carlos Guestrin,et al.  "Why Should I Trust You?": Explaining the Predictions of Any Classifier , 2016, ArXiv.

[44]  Suman Mor,et al.  Generalized additive models: Building evidence of air pollution, climate change and human health. , 2019, Environment international.

[45]  Christopher T. Lowenkamp,et al.  False Positives, False Negatives, and False Analyses: A Rejoinder to "Machine Bias: There's Software Used across the Country to Predict Future Criminals. and It's Biased against Blacks" , 2016 .

[46]  Robert Tibshirani,et al.  The Elements of Statistical Learning: Data Mining, Inference, and Prediction, 2nd Edition , 2001, Springer Series in Statistics.

[47]  R. Tibshirani,et al.  Additive models with trend filtering , 2017, The Annals of Statistics.