Deep Quantile Aggregation

Conditional quantile estimation is a key statistical learning challenge motivated by the need to quantify uncertainty in predictions or to model a diverse population without being overly reductive. As such, many models have been developed for this problem. Adopting a meta viewpoint, we propose a general framework (inspired by neural network optimization) for aggregating any number of conditional quantile models in order to boost predictive accuracy. We consider weighted ensembling strategies of increasing flexibility where the weights may vary over individual models, quantile levels, and feature values. An appeal of our approach is its portability: we ensure that estimated quantiles at adjacent levels do not cross by applying simple transformations through which gradients can be backpropagated, and this allows us to leverage the modern deep learning toolkit for building quantile ensembles. Our experiments confirm that ensembling can lead to big gains in accuracy, even when the constituent models are themselves powerful and flexible.

[1]  Fabio Busetti,et al.  Quantile Aggregation of Density Forecasts , 2014 .

[2]  Slawek Smyl,et al.  Machine learning methods for GEFCom2017 probabilistic load forecasting , 2019, International Journal of Forecasting.

[3]  Furno Marilena,et al.  Quantile Regression , 2018, Wiley Series in Probability and Statistics.

[4]  P. Chaudhuri,et al.  Some intriguing properties of Tukey's half-space depth , 2012, 1201.1171.

[5]  Willie Neiswanger,et al.  Beyond Pinball Loss: Quantile Methods for Calibrated Uncertainty Quantification , 2020, NeurIPS.

[6]  Pierre Geurts,et al.  Extremely randomized trees , 2006, Machine Learning.

[7]  V. Genrea,et al.  Combining expert forecasts : Can anything beat the simple average ? , 2012 .

[8]  F. Diebold,et al.  Forecast Evaluation and Combination , 1996 .

[9]  Holger Dette,et al.  Non‐crossing non‐parametric estimates of quantile curves , 2008 .

[10]  A. Raftery,et al.  Using Bayesian Model Averaging to Calibrate Forecast Ensembles , 2005 .

[11]  Roger Koenker,et al.  Quantile Regression : Penalized from : Handbook of Quantile Regression , 2018 .

[12]  Kory D. Johnson,et al.  Adaptive, Distribution-Free Prediction Intervals for Deep Networks , 2019, AISTATS.

[13]  Alexander J. Smola,et al.  Nonparametric Quantile Estimation , 2006, J. Mach. Learn. Res..

[14]  Tom Diethe,et al.  Distribution Calibration for Regression , 2019, ICML.

[15]  Linglong Kong,et al.  Quantile tomography: using quantiles with multivariate data , 2008, Statistica Sinica.

[16]  Alex J. Cannon,et al.  Non-crossing nonlinear regression quantiles by monotone composite quantile regression neural network, with application to rainfall extremes , 2018, Stochastic Environmental Research and Risk Assessment.

[17]  Jonas Mueller,et al.  Maximizing Overall Diversity for Improved Uncertainty Estimates in Deep Ensembles , 2019, AAAI.

[18]  David Lopez-Paz,et al.  Single-Model Uncertainties for Deep Learning , 2018, NeurIPS.

[19]  Thomas G. Dietterich Ensemble Methods in Machine Learning , 2000, Multiple Classifier Systems.

[20]  Natalia Gimelshein,et al.  PyTorch: An Imperative Style, High-Performance Deep Learning Library , 2019, NeurIPS.

[21]  Alexander J. Smola,et al.  Fast, Accurate, and Simple Models for Tabular Data via Augmented Distillation , 2020, NeurIPS.

[22]  Syama Sundar Rangapuram,et al.  Probabilistic Forecasting with Spline Quantile Function RNNs , 2019, AISTATS.

[23]  Luís Torgo,et al.  OpenML: networked science in machine learning , 2014, SKDD.

[24]  L. Lima,et al.  Out‐of‐Sample Return Predictability: A Quantile Combination Approach , 2017 .

[25]  Tie-Yan Liu,et al.  LightGBM: A Highly Efficient Gradient Boosting Decision Tree , 2017, NIPS.

[26]  C. Genest Vincentization Revisited , 2010 .

[27]  T. Hamill,et al.  A Probabilistic Forecast Contest and the Difficulty in Assessing Short-Range Forecast Uncertainty , 1995 .

[28]  Thomas E. Nichols,et al.  Quantifying uncertainty in brain-predicted age using scalar-on-image quantile regression , 2019, NeuroImage.

[29]  H. Bondell,et al.  Noncrossing quantile regression curve estimation. , 2010, Biometrika.

[30]  Alberto J. Lamadrid,et al.  Smooth Pinball Neural Network for Probabilistic Forecasting of Wind Power , 2017, 1710.01720.

[31]  Yael Grushka-Cockayne,et al.  Is it Better to Average Probabilities or Quantiles? , 2012, Manag. Sci..

[32]  A. Timmermann Forecast Combinations , 2005 .

[33]  Minchul Shin,et al.  Probability Forecast Combination via Entropy Regularized Wasserstein Distance , 2020, Entropy.

[34]  Q. J. Wang,et al.  Model averaging methods to merge operational statistical and dynamic seasonal streamflow forecasts in Australia , 2015 .

[35]  B. Narasimhan,et al.  Bone mineral acquisition in healthy Asian, Hispanic, black, and Caucasian youth: a longitudinal study. , 1999, The Journal of clinical endocrinology and metabolism.

[36]  H. D. Brunk,et al.  Statistical inference under order restrictions : the theory and application of isotonic regression , 1973 .

[37]  R. Tawn,et al.  Quantile Combination for the EEM20 Wind Power Forecasting Competition , 2020, 2020 17th International Conference on the European Energy Market (EEM).

[38]  A. Raftery,et al.  Strictly Proper Scoring Rules, Prediction, and Estimation , 2007 .

[39]  Bartosz Uniejewski,et al.  Regularized quantile regression averaging for probabilistic electricity price forecasting , 2021 .

[40]  Stefano Ermon,et al.  Accurate Uncertainties for Deep Learning Using Calibrated Regression , 2018, ICML.

[41]  Yaniv Romano,et al.  Conformalized Quantile Regression , 2019, NeurIPS.

[42]  Rich Caruana,et al.  Ensemble selection from libraries of models , 2004, ICML.

[43]  P. Gaillard,et al.  Additive models and robust aggregation for GEFCom2014 probabilistic electric load and electricity price forecasting , 2016 .

[44]  Nicolai Meinshausen,et al.  Quantile Regression Forests , 2006, J. Mach. Learn. Res..

[45]  R. Weron,et al.  Probabilistic forecasting of electricity spot prices using Factor Quantile Regression Averaging , 2016 .

[46]  T. Gneiting,et al.  Combining Predictive Distributions , 2011, 1106.1638.

[47]  David H. Wolpert,et al.  Stacked generalization , 1992, Neural Networks.

[48]  Charles Blundell,et al.  Simple and Scalable Predictive Uncertainty Estimation using Deep Ensembles , 2016, NIPS.

[49]  George Kapetanios,et al.  Generalised Density Forecast Combinations , 2014 .

[50]  Jishnu Mukhoti,et al.  On the Importance of Strong Baselines in Bayesian Deep Learning , 2018, ArXiv.

[51]  R. Weron,et al.  Recent advances in electricity price forecasting: A review of probabilistic forecasting , 2016 .

[52]  Yehuda Koren,et al.  The BellKor Solution to the Netflix Grand Prize , 2009 .

[53]  Tommi S. Jaakkola,et al.  Modeling Persistent Trends in Distributions , 2015, Journal of the American Statistical Association.

[54]  Hang Zhang,et al.  AutoGluon-Tabular: Robust and Accurate AutoML for Structured Data , 2020, ArXiv.

[55]  P. Austin,et al.  The use of quantile regression in health care research: a case study examining gender differences in the timeliness of thrombolytic therapy , 2005, Statistics in medicine.

[56]  J. Zico Kolter,et al.  The Multiple Quantile Graphical Model , 2016, NIPS.

[57]  Syama Sundar Rangapuram,et al.  Neural forecasting: Introduction and literature overview , 2020, ArXiv.

[58]  R. Ratcliff Group reaction time distributions and an analysis of distribution statistics. , 1979, Psychological bulletin.

[59]  Hao Wen,et al.  Composite Quantile Regression Long Short-Term Memory Network , 2019, ICANN.

[60]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[61]  V. Chernozhukov,et al.  QUANTILE AND PROBABILITY CURVES WITHOUT CROSSING , 2007, 0704.3649.

[62]  D. Paindaveine,et al.  Multivariate quantiles and multiple-output regression quantiles: from L1 optimization to halfspace depth , 2010, 1002.4486.

[63]  Ian H. Witten,et al.  Stacking Bagged and Dagged Models , 1997, ICML.

[64]  Yi Wang,et al.  Load probability density forecasting by transforming and combining quantile forecasts , 2020, Applied Energy.

[65]  Julia Schaumburg,et al.  Predicting extreme value at risk: Nonparametric quantile regression with refinements from extreme value theory , 2012, Comput. Stat. Data Anal..

[66]  Stephan Rasp,et al.  Neural networks for post-processing ensemble weather forecasts , 2018, Monthly Weather Review.

[67]  R. L. Winkler A Decision-Theoretic Approach to Interval Estimation , 1972 .

[68]  Daria A. Semochkina,et al.  Uncertainty quantification for epidemiological forecasts of COVID-19 through combinations of model predictions , 2020, Statistical methods in medical research.

[69]  Geoffrey E. Hinton,et al.  Adaptive Mixtures of Local Experts , 1991, Neural Computation.

[70]  A. Hald A history of mathematical statistics from 1750 to 1930 , 1998 .

[71]  Roland Badeau,et al.  Generalized Sliced Wasserstein Distances , 2019, NeurIPS.

[72]  T. Gneiting,et al.  Combining probability forecasts , 2010 .

[73]  A. Belloni,et al.  ℓ[subscript 1]-penalized quantile regression in high-dimensional sparse models , 2011 .

[74]  Jakub Nowotarski,et al.  A hybrid model for GEFCom2014 probabilistic electricity price forecasting , 2016 .

[75]  W. Manning,et al.  Thinking beyond the mean: a practical guide for using quantile regression methods for health services research , 2013, Shanghai archives of psychiatry.

[76]  M. Stone The Opinion Pool , 1961 .

[77]  Jakub Nowotarski,et al.  Computing electricity spot price prediction intervals using quantile regression and forecast averaging , 2015, Comput. Stat..

[78]  Yoshua Bengio,et al.  Neural Machine Translation by Jointly Learning to Align and Translate , 2014, ICLR.

[79]  H. Wold,et al.  Some Theorems on Distribution Functions , 1936 .

[80]  Marc G. Bellemare,et al.  Distributional Reinforcement Learning with Quantile Regression , 2017, AAAI.