Gaussian Process Boosting

In this article, we propose a novel way to combine boosting with Gaussian process and mixed effects models. This allows for relaxing (i) the linearity assumption for the mean function in Gaussian process and mixed effects models in a flexible non-parametric way and (ii) the independence assumption made in most boosting algorithms. The former is advantageous for predictive accuracy and for avoiding model misspecifications. The latter is important for more efficient learning of the mean function and for obtaining probabilistic predictions. In addition, we present an extension that scales to large data using a Vecchia approximation for the Gaussian process model relying on novel results for covariance parameter inference. We obtain increased predictive performance compared to existing approaches using several simulated datasets and in house price and online transaction applications.

[1]  P. Guttorp,et al.  Flexible spatial covariance functions , 2020 .

[2]  D. Bates,et al.  Mixed-Effects Models in S and S-PLUS , 2001 .

[3]  Wei-Yin Loh,et al.  Classification and regression trees , 2011, WIREs Data Mining Knowl. Discov..

[4]  N. Reid,et al.  AN OVERVIEW OF COMPOSITE LIKELIHOOD METHODS , 2011 .

[5]  David Mease,et al.  Explaining the Success of AdaBoost and Random Forests as Interpolating Classifiers , 2015, J. Mach. Learn. Res..

[6]  J Elith,et al.  A working guide to boosted regression trees. , 2008, The Journal of animal ecology.

[7]  Benjamin Hofner,et al.  Model-based boosting in R: a hands-on tutorial using the R package mboost , 2012, Computational Statistics.

[8]  Jean Dubé,et al.  Dealing with spatial data pooled over time in statistical models , 2013 .

[9]  Gérard Biau,et al.  Accelerated gradient boosting , 2018, Machine Learning.

[10]  Didrik Nielsen,et al.  Tree Boosting With XGBoost - Why Does XGBoost Win "Every" Machine Learning Competition? , 2016 .

[11]  Denis Larocque,et al.  Mixed-effects random forest for clustered data , 2014 .

[12]  Leonhard Held,et al.  Discrete Spatial Variation , 2010 .

[13]  Carl E. Rasmussen,et al.  Gaussian processes for machine learning , 2005, Adaptive computation and machine learning.

[14]  J. Friedman Greedy function approximation: A gradient boosting machine. , 2001 .

[15]  Joseph Guinness,et al.  Gaussian process learning via Fisher scoring of Vecchia’s approximation , 2019, Statistics and Computing.

[16]  J. Ware,et al.  Random-effects models for longitudinal data. , 1982, Biometrics.

[17]  Haavard Rue,et al.  Does non-stationary spatial data always require non-stationary random fields? , 2014 .

[18]  R. Bivand After 'Raising the Bar': Applied Maximum Likelihood Estimation of Families of Models in Spatial Econometrics , 2011 .

[19]  Liang Li,et al.  Boosted multivariate trees for longitudinal data , 2016, Machine Learning.

[20]  Fabio Sigrist,et al.  KTBoost: Combined Kernel and Tree Boosting , 2019, Neural Processing Letters.

[21]  B. Yu,et al.  Boosting with the L_2-Loss: Regression and Classification , 2001 .

[22]  A. V. Vecchia Estimation and model identification for continuous spatial processes , 1988 .

[23]  Vahab S. Mirrokni,et al.  Accelerating Gradient Boosting Machine , 2019, ArXiv.

[24]  A. O'Hagan,et al.  Bayesian calibration of computer models , 2001 .

[25]  Yoav Freund,et al.  Experiments with a New Boosting Algorithm , 1996, ICML.

[26]  Jakob A. Dambon,et al.  Maximum likelihood estimation of spatially varying coefficient models for large data with an application to real estate price prediction , 2020, Spatial Statistics.

[27]  Samy Bengio,et al.  Understanding deep learning requires rethinking generalization , 2016, ICLR.

[28]  J. Freidman,et al.  Multivariate adaptive regression splines , 1991 .

[29]  Guangqing Chi,et al.  Applied Spatial Data Analysis with R , 2015 .

[30]  C. McCulloch,et al.  Misspecifying the Shape of a Random Effects Distribution: Why Getting It Wrong May Not Matter , 2011, 1201.1980.

[31]  J. Friedman Special Invited Paper-Additive logistic regression: A statistical view of boosting , 2000 .

[32]  Jasper Snoek,et al.  Practical Bayesian Optimization of Machine Learning Algorithms , 2012, NIPS.

[33]  P. Bühlmann,et al.  Boosting With the L2 Loss , 2003 .

[34]  M. Katzfuss,et al.  A General Framework for Vecchia Approximations of Gaussian Processes , 2017, 1708.06302.

[35]  Peter L. Bartlett,et al.  Boosting Algorithms as Gradient Descent , 1999, NIPS.

[36]  James P. LeSage,et al.  Models for Spatially Dependent Missing Data , 2004 .

[37]  Roger Woodard,et al.  Interpolation of Spatial Data: Some Theory for Kriging , 1999, Technometrics.

[38]  Alan Y. Chiang,et al.  Generalized Additive Models: An Introduction With R , 2007, Technometrics.

[39]  Matthias Katzfuss,et al.  Vecchia Approximations of Gaussian-Process Predictions , 2018, Journal of Agricultural, Biological and Environmental Statistics.

[40]  Jeffrey S. Simonoff,et al.  Unbiased Regression Trees for Longitudinal and Clustered Data , 2014, Comput. Stat. Data Anal..

[41]  Tianqi Chen,et al.  XGBoost: A Scalable Tree Boosting System , 2016, KDD.

[42]  Fabio Sigrist,et al.  Gradient and Newton Boosting for Classification and Regression , 2018, Expert Syst. Appl..

[43]  Tie-Yan Liu,et al.  LightGBM: A Highly Efficient Gradient Boosting Decision Tree , 2017, NIPS.

[44]  Joseph Guinness,et al.  Permutation and Grouping Methods for Sharpening Gaussian Process Approximations , 2016, Technometrics.

[45]  A. Raftery,et al.  Probabilistic forecasts, calibration and sharpness , 2007 .

[46]  L. Breiman Arcing Classifiers , 1998 .

[47]  Yurii Nesterov,et al.  Introductory Lectures on Convex Optimization - A Basic Course , 2014, Applied Optimization.

[48]  Sw. Banerjee,et al.  Hierarchical Modeling and Analysis for Spatial Data , 2003 .

[49]  C. F. Sirmans,et al.  Spatial Modeling With Spatially Varying Coefficient Processes , 2003 .

[50]  Mikhail Belkin,et al.  To understand deep learning we need to understand kernel learning , 2018, ICML.

[51]  Nuno Vasconcelos,et al.  TaylorBoost: First and second-order boosting algorithms with explicit margin control , 2011, CVPR 2011.

[52]  G. Tutz,et al.  A boosting approach to flexible semiparametric mixed models , 2007, Statistics in medicine.

[53]  Torsten Hothorn,et al.  Model-based Boosting 2.0 , 2010, J. Mach. Learn. Res..

[54]  V. P. Godambe An Optimum Property of Regular Maximum Likelihood Estimation , 1960 .

[55]  R. Tibshirani,et al.  Generalized Additive Models , 1986 .

[56]  Christian P. Robert,et al.  Statistics for Spatio-Temporal Data , 2014 .

[57]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[58]  Robert H. Shumway,et al.  Time series analysis and its applications : with R examples , 2017 .

[59]  G Tutz,et al.  Regularization for Generalized Additive Mixed Models by Likelihood-based Boosting , 2012, Methods of Information in Medicine.

[60]  Anna Veronika Dorogush,et al.  CatBoost: unbiased boosting with categorical features , 2017, NeurIPS.

[61]  Denis Larocque,et al.  Mixed effects regression trees for clustered data , 2008 .

[62]  Donald R. Jones,et al.  Efficient Global Optimization of Expensive Black-Box Functions , 1998, J. Glob. Optim..

[63]  Peter Buhlmann,et al.  BOOSTING ALGORITHMS: REGULARIZATION, PREDICTION AND MODEL FITTING , 2007, 0804.2752.

[64]  Andrew O. Finley,et al.  Efficient Algorithms for Bayesian Nearest Neighbor Gaussian Processes , 2017, Journal of computational and graphical statistics : a joint publication of American Statistical Association, Institute of Mathematical Statistics, Interface Foundation of North America.

[65]  Jeffrey S. Simonoff,et al.  RE-EM trees: a data mining approach for longitudinal and clustered data , 2011, Machine Learning.

[66]  Panagiotis G. Ipeirotis,et al.  The Dimensions of Reputation in Electronic Markets , 2009 .

[67]  Sudipto Banerjee,et al.  Hierarchical Nearest-Neighbor Gaussian Process Models for Large Geostatistical Datasets , 2014, Journal of the American Statistical Association.

[68]  Peter Buhlmann Boosting for high-dimensional linear models , 2006, math/0606789.

[69]  S. Wood Generalized Additive Models: An Introduction with R , 2006 .