Probabilistic Gradient Boosting Machines for Large-Scale Probabilistic Regression

Gradient Boosting Machines (GBM) are hugely popular for solving tabular data problems. However, practitioners are not only interested in point predictions, but also in probabilistic predictions in order to quantify the uncertainty of the predictions. Creating such probabilistic predictions is difficult with existing GBM-based solutions: they either require training multiple models or they become too computationally expensive to be useful for large-scale settings. We propose Probabilistic Gradient Boosting Machines (PGBM), a method to create probabilistic predictions with a single ensemble of decision trees in a computationally efficient manner. PGBM approximates the leaf weights in a decision tree as a random variable, and approximates the mean and variance of each sample in a dataset via stochastic tree ensemble update equations. These learned moments allow us to subsequently sample from a specified distribution after training. We empirically demonstrate the advantages of PGBM compared to existing state-of-the-art methods: (i) PGBM enables probabilistic estimates without compromising on point performance in a single model, (ii) PGBM learns probabilistic estimates via a single model only (and without requiring multi-parameter boosting), and thereby offers a speedup of up to several orders of magnitude over existing state-of-the-art methods on large datasets, and (iii) PGBM achieves accurate probabilistic estimates in tasks with complex differentiable loss functions, such as hierarchical time series problems, where we observed up to 10% improvement in point forecasting performance and up to 300% improvement in probabilistic forecasting performance.

[1]  J. Friedman Greedy function approximation: A gradient boosting machine. , 2001 .

[2]  Valentin Flunkert,et al.  DeepAR: Probabilistic Forecasting with Autoregressive Recurrent Networks , 2017, International Journal of Forecasting.

[3]  Syama Sundar Rangapuram,et al.  GluonTS: Probabilistic and Neural Time Series Modeling in Python , 2020, J. Mach. Learn. Res..

[4]  R. L. Winkler,et al.  The M5 uncertainty competition: Results, findings and conclusions , 2021, International Journal of Forecasting.

[5]  H. Chipman,et al.  BART: Bayesian Additive Regression Trees , 2008, 0806.3286.

[6]  Anna Hartnell Bart , 2018 .

[7]  Andrew Y. Ng,et al.  NGBoost: Natural Gradient Boosting for Probabilistic Prediction , 2019, ICML.

[8]  Wenhu Chen,et al.  Enhancing the Locality and Breaking the Memory Bottleneck of Transformer on Time Series Forecasting , 2019, NeurIPS.

[9]  R. Rigby,et al.  Generalized additive models for location, scale and shape , 2005 .

[10]  George Athanasopoulos,et al.  Forecasting: principles and practice , 2013 .

[11]  Natalia Gimelshein,et al.  PyTorch: An Imperative Style, High-Performance Deep Learning Library , 2019, NeurIPS.

[12]  Nicolas Loeff,et al.  Temporal Fusion Transformers for Interpretable Multi-horizon Time Series Forecasting , 2021, International Journal of Forecasting.

[13]  Spyros Makridakis,et al.  M5 accuracy competition: Results, findings, and conclusions , 2022, International Journal of Forecasting.

[14]  Joos-Hendrik Böse,et al.  Probabilistic Demand Forecasting at Scale , 2017, Proc. VLDB Endow..

[15]  Fabio Sigrist,et al.  Gradient and Newton Boosting for Classification and Regression , 2018, Expert Syst. Appl..

[16]  Philippe Naveau,et al.  Estimation of the Continuous Ranked Probability Score with Limited Information and Applications to Ensemble Weather Forecasts , 2018, Mathematical Geosciences.

[17]  Hsiang-Fu Yu,et al.  Think Globally, Act Locally: A Deep Neural Network Approach to High-Dimensional Time Series Forecasting , 2019, NeurIPS.

[18]  Bernhard Pfahringer,et al.  Stochastic Gradient Trees , 2019, ACML.

[19]  Tianqi Chen,et al.  XGBoost: A Scalable Tree Boosting System , 2016, KDD.

[20]  Nicolai Meinshausen,et al.  Quantile Regression Forests , 2006, J. Mach. Learn. Res..

[21]  Rob J. Hyndman,et al.  Probabilistic time series forecasting with boosted additive models: an application to smart meter data , 2015 .

[22]  G. Box,et al.  Distribution of Residual Autocorrelations in Autoregressive-Integrated Moving Average Time Series Models , 1970 .

[23]  Benjamin Letham,et al.  Forecasting at Scale , 2018 .

[24]  Tie-Yan Liu,et al.  LightGBM: A Highly Efficient Gradient Boosting Decision Tree , 2017, NIPS.