Latent Gaussian Model Boosting

Latent Gaussian models and boosting are widely used techniques in statistics and machine learning. Tree-boosting shows excellent prediction accuracy on many data sets, but potential drawbacks are that it assumes conditional independence of samples, produces discontinuous predictions for, e.g., spatial data, and it can have difficulty with high-cardinality categorical variables. Latent Gaussian models, such as Gaussian process and grouped random effects models, are flexible prior models which explicitly model dependence among samples and which allow for efficient learning of predictor functions and for making probabilistic predictions. However, existing latent Gaussian models usually assume either a zero or a linear prior mean function which can be an unrealistic assumption. This article introduces a novel approach that combines boosting and latent Gaussian models to remedy the above-mentioned drawbacks and to leverage the advantages of both techniques. We obtain increased prediction accuracy compared to existing approaches in both simulated and real-world data experiments.

[1]  Yoav Freund,et al.  Experiments with a New Boosting Algorithm , 1996, ICML.

[2]  Fabio Sigrist,et al.  KTBoost: Combined Kernel and Tree Boosting , 2019, Neural Processing Letters.

[3]  Denis Larocque,et al.  Generalized mixed effects regression trees , 2010 .

[4]  Carl E. Rasmussen,et al.  Assessing Approximate Inference for Binary Gaussian Process Classification , 2005, J. Mach. Learn. Res..

[5]  C. Rasmussen,et al.  Approximations for Binary Gaussian Process Classification , 2008 .

[6]  Benjamin Hofner,et al.  Model-based boosting in R: a hands-on tutorial using the R package mboost , 2012, Computational Statistics.

[7]  Sumanta Basu,et al.  Random Forests for Spatially Dependent Data , 2021, Journal of the American Statistical Association.

[8]  Wei-Yin Loh,et al.  Classification and regression trees , 2011, WIREs Data Mining Knowl. Discov..

[9]  Tom Minka,et al.  Expectation Propagation for approximate Bayesian inference , 2001, UAI.

[10]  Tie-Yan Liu,et al.  LightGBM: A Highly Efficient Gradient Boosting Decision Tree , 2017, NIPS.

[11]  Francesca Ieva,et al.  Generalized mixed‐effects random forest: A flexible approach to predict university student dropout , 2021, Stat. Anal. Data Min..

[12]  G. Tutz,et al.  A boosting approach to flexible semiparametric mixed models , 2007, Statistics in medicine.

[13]  Fabio Sigrist,et al.  Gradient and Newton Boosting for Classification and Regression , 2018, Expert Syst. Appl..

[14]  Torsten Hothorn,et al.  Model-based Boosting 2.0 , 2010, J. Mach. Learn. Res..

[15]  Yurii Nesterov,et al.  Introductory Lectures on Convex Optimization - A Basic Course , 2014, Applied Optimization.

[16]  S. Rosset,et al.  Cross-Validation for Correlated Data , 2019, Journal of the American Statistical Association.

[17]  Gérard Biau,et al.  Accelerated gradient boosting , 2018, Machine Learning.

[18]  Carl E. Rasmussen,et al.  Gaussian processes for machine learning , 2005, Adaptive computation and machine learning.

[19]  David G. Koch,et al.  BiMM forest: A random forest method for modeling clustered and longitudinal binary outcomes. , 2019, Chemometrics and intelligent laboratory systems : an international journal sponsored by the Chemometrics Society.

[20]  Tianqi Chen,et al.  XGBoost: A Scalable Tree Boosting System , 2016, KDD.

[21]  Mikhail Belkin,et al.  Reconciling modern machine-learning practice and the classical bias–variance trade-off , 2018, Proceedings of the National Academy of Sciences.

[22]  Samy Bengio,et al.  Understanding deep learning requires rethinking generalization , 2016, ICLR.

[23]  Elisabeth Waldmann,et al.  Gradient boosting for linear mixed models , 2021, The international journal of biostatistics.

[24]  Jeffrey S. Simonoff,et al.  RE-EM trees: a data mining approach for longitudinal and clustered data , 2011, Machine Learning.

[25]  David G. Koch,et al.  BiMM tree: a decision tree method for modeling clustered and longitudinal binary outcomes , 2018, Commun. Stat. Simul. Comput..

[26]  Steven J. Phillips,et al.  Presence-only and Presence-absence Data for Comparing Species Distribution Modeling Methods , 2020, Biodiversity Informatics.

[27]  L. Tierney,et al.  Accurate Approximations for Posterior Moments and Marginal Densities , 1986 .

[28]  G Tutz,et al.  Regularization for Generalized Additive Mixed Models by Likelihood-based Boosting , 2012, Methods of Information in Medicine.

[29]  C. F. Sirmans,et al.  Spatial Modeling With Spatially Varying Coefficient Processes , 2003 .

[30]  Denis Larocque,et al.  Mixed-effects random forest for clustered data , 2014 .

[31]  Denis Larocque,et al.  Mixed effects regression trees for clustered data , 2008 .

[32]  Nuno Vasconcelos,et al.  TaylorBoost: First and second-order boosting algorithms with explicit margin control , 2011, CVPR 2011.

[33]  V. Carey,et al.  Mixed-Effects Models in S and S-Plus , 2001 .

[34]  Saharon Rosset,et al.  Tree-Based Models for Correlated Data , 2021, J. Mach. Learn. Res..

[35]  Qing Liu,et al.  A note on Gauss—Hermite quadrature , 1994 .

[36]  Peter Buhlmann Boosting for high-dimensional linear models , 2006, math/0606789.

[37]  P. Bühlmann,et al.  Boosting with the L2-loss: regression and classification , 2001 .

[38]  W. Tobler A Computer Movie Simulating Urban Growth in the Detroit Region , 1970 .

[39]  Alan Y. Chiang,et al.  Generalized Additive Models: An Introduction With R , 2007, Technometrics.

[40]  Vahab S. Mirrokni,et al.  Accelerating Gradient Boosting Machine , 2019, ArXiv.

[41]  David Mease,et al.  Explaining the Success of AdaBoost and Random Forests as Interpolating Classifiers , 2015, J. Mach. Learn. Res..

[42]  Amitai Armon,et al.  Tabular Data: Deep Learning is Not All You Need , 2021, Inf. Fusion.

[43]  Jeffrey S. Simonoff,et al.  Unbiased Regression Trees for Longitudinal and Clustered Data , 2014, Comput. Stat. Data Anal..

[44]  J. Freidman,et al.  Multivariate adaptive regression splines , 1991 .

[45]  Tong Zhang,et al.  Learning Nonlinear Functions Using Regularized Greedy Forest , 2011, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[46]  C. McCulloch,et al.  Misspecifying the Shape of a Random Effects Distribution: Why Getting It Wrong May Not Matter , 2011, 1201.1980.

[47]  J. Friedman Greedy function approximation: A gradient boosting machine. , 2001 .

[48]  Mikhail Belkin,et al.  To understand deep learning we need to understand kernel learning , 2018, ICML.

[49]  R. Tibshirani,et al.  Generalized Additive Models , 1986 .

[50]  Philip M. Long,et al.  Benign overfitting in linear regression , 2019, Proceedings of the National Academy of Sciences.

[51]  T Hothorn,et al.  Detecting treatment-subgroup interactions in clustered data with generalized linear mixed-effects model trees , 2017, Behavior Research Methods.