论文信息 - How Interpretable and Trustworthy are GAMs?

How Interpretable and Trustworthy are GAMs?

Generalized additive models (GAMs) have become a leading model class for interpretable machine learning. However, there are many algorithms for training GAMs, and these can learn different or even contradictory models, while being equally accurate. Which GAM should we trust? In this paper, we quantitatively and qualitatively investigate a variety of GAM algorithms on real and simulated datasets. We find that GAMs with high feature sparsity (only using a few variables to make predictions) can miss patterns in the data and be unfair to rare subpopulations. Our results suggest that inductive bias plays a crucial role in what interpretable models learn and that tree-based GAMs represent the best balance of sparsity, fidelity and accuracy and thus appear to be the most trustworthy GAM models.

[1] Eric J. Pedersen,et al. Hierarchical generalized additive models in ecology: an introduction with mgcv , 2019, PeerJ.

[2] Claudio Moser,et al. Applying generalized additive models to unravel dynamic changes in anthocyanin biosynthesis in methyl jasmonate elicited grapevine (Vitis vinifera cv. Gamay) cell cultures , 2017, Horticulture Research.

[3] Johannes Gehrke,et al. Intelligible Models for HealthCare: Predicting Pneumonia Risk and Hospital 30-day Readmission , 2015, KDD.

[4] Ashley Petersen,et al. Fused Lasso Additive Model , 2014, Journal of computational and graphical statistics : a joint publication of American Statistical Association, Institute of Mathematical Statistics, Interface Foundation of North America.

[5] Rich Caruana,et al. InterpretML: A Unified Framework for Machine Learning Interpretability , 2019, ArXiv.

[6] Alexandra Chouldechova,et al. Fair prediction with disparate impact: A study of bias in recidivism prediction instruments , 2016, Big Data.

[7] Ron Kohavi,et al. Scaling Up the Accuracy of Naive-Bayes Classifiers: A Decision-Tree Hybrid , 1996, KDD.

[8] Albert Gordo,et al. Learning Global Additive Explanations for Neural Nets Using Model Distillation , 2018 .

[9] Achim Zeileis,et al. Bias in random forest variable importance measures: Illustrations, sources and a solution , 2007, BMC Bioinformatics.

[10] P. Tseng,et al. AMlet, RAMlet, and GAMlet: Automatic Nonlinear Fitting of Additive Models, Robust and Generalized, With Wavelets , 2004 .

[11] Gerhard Tutz,et al. A comparison of methods for the fitting of generalized additive models , 2008, Stat. Comput..

[12] Rich Caruana,et al. Distill-and-Compare: Auditing Black-Box Models Using Transparent Model Distillation , 2017, AIES.

[13] Paul R Hunter,et al. Have We Substantially Underestimated the Impact of Improved Sanitation Coverage on Child Health? A Generalized Additive Model Panel Analysis of Global Data on Child Mortality and Malnutrition , 2016, PloS one.

[14] Farzali Izadi,et al. Generalized Additive Models to Capture the Death Rates in Canada COVID-19 , 2020, Mathematics of Public Health.

[15] Grace Wahba,et al. Spline Models for Observational Data , 1990 .

[16] Stefan Hegselmann,et al. An Evaluation of the Doctor-Interpretability of Generalized Additive Models with Interactions , 2020, MLHC.