Intelligible models for classification and regression

Complex models for regression and classification have high accuracy, but are unfortunately no longer interpretable by users. We study the performance of generalized additive models (GAMs), which combine single-feature models called shape functions through a linear function. Since the shape functions can be arbitrarily complex, GAMs are more accurate than simple linear models. But since they do not contain any interactions between features, they can be easily interpreted by users. We present the first large-scale empirical comparison of existing methods for learning GAMs. Our study includes existing spline and tree-based methods for shape functions and penalized least squares, gradient boosting, and backfitting for learning GAMs. We also present a new method based on tree ensembles with an adaptive number of leaves that consistently outperforms previous work. We complement our experimental results with a bias-variance analysis that explains how different shape models influence the additive model. Our experiments show that shallow bagged trees with gradient boosting distinguish itself as the best method on low- to medium-dimensional datasets.

[1]  R. Tibshirani,et al.  Generalized Additive Models , 1986 .

[2]  J. Friedman Stochastic gradient boosting , 2002 .

[3]  Gerhard Tutz,et al.  A comparison of methods for the fitting of generalized additive models , 2008, Stat. Comput..

[4]  G. Hooker Generalized Functional ANOVA Diagnostics for High-Dimensional Functions of Dependent Variables , 2007 .

[5]  S. Wood Generalized Additive Models: An Introduction with R , 2006 .

[6]  Eric Bauer,et al.  An Empirical Comparison of Voting Classification Algorithms: Bagging, Boosting, and Variants , 1999, Machine Learning.

[7]  S. Wood Thin plate regression splines , 2003 .

[8]  Rich Caruana,et al.  Additive Groves of Regression Trees , 2007, ECML.

[9]  J. Friedman,et al.  Estimating Optimal Transformations for Multiple Regression and Correlation. , 1985 .

[10]  Alan Y. Chiang,et al.  Generalized Additive Models: An Introduction With R , 2007, Technometrics.

[11]  Stephen Tyree,et al.  Parallel boosted regression trees for web search ranking , 2011, WWW.

[12]  Roberto J. Bayardo,et al.  PLANET: Massively Parallel Learning of Tree Ensembles with MapReduce , 2009, Proc. VLDB Endow..

[13]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[14]  J. Lafferty,et al.  Sparse additive models , 2007, 0711.4555.

[15]  George Forman,et al.  Feature shaping for linear SVM classifiers , 2009, KDD.

[16]  Rich Caruana,et al.  An empirical comparison of supervised learning algorithms , 2006, ICML.

[17]  J. Friedman Greedy function approximation: A gradient boosting machine. , 2001 .