Nonparametric conditional density estimation in a deep learning framework for short-term forecasting

Short-term forecasting is an important tool in understanding environmental processes. In this paper, we incorporate machine learning algorithms into a conditional distribution estimator for the purposes of forecasting tropical cyclone intensity. Many machine learning techniques give a single-point prediction of the conditional distribution of the target variable, which does not give a full accounting of the prediction variability. Conditional distribution estimation can provide extra insight on predicted response behavior, which could influence decision-making and policy. We propose a technique that simultaneously estimates the entire conditional distribution and flexibly allows for machine learning techniques to be incorporated. A smooth model is fit over both the target variable and covariates, and a logistic transformation is applied on the model output layer to produce an expression of the conditional density function. We provide two examples of machine learning models that can be used, polynomial regression and deep learning models. To achieve computational efficiency we propose a case-control sampling approximation to the conditional distribution. A simulation study for four different data distributions highlights the effectiveness of our method compared to other machine learning-based conditional distribution estimation techniques. We then demonstrate the utility of our approach for forecasting purposes using tropical cyclone data from the Atlantic Seaboard. This paper gives a proof of concept for the promise of our method, further computational developments can fully unlock its insights in more complex forecasting and other applications.

[1]  D. Dunson,et al.  BAYESIAN GENERALIZED PRODUCT PARTITION MODEL , 2010 .

[2]  Fengchun Peng,et al.  Bayesian Inference in Mixtures-of-Experts and Hierarchical Mixtures-of-Experts Models With an Applic , 1996 .

[3]  Sergey Ioffe,et al.  Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift , 2015, ICML.

[4]  Rodney C. Wolff,et al.  Methods for estimating a conditional distribution function , 1999 .

[5]  M. Escobar,et al.  Bayesian Density Estimation and Inference Using Mixtures , 1995 .

[6]  P. Müller,et al.  Bayesian curve fitting using multivariate normal mixtures , 1996 .

[7]  Samy Bengio,et al.  Conditional Gaussian mixture models for environmental risk mapping , 2002, Proceedings of the 12th IEEE Workshop on Neural Networks for Signal Processing.

[8]  Liang Peng,et al.  Approximating conditional density functions using dimension reduction , 2009 .

[9]  A. Raftery,et al.  Strictly Proper Scoring Rules, Prediction, and Estimation , 2007 .

[10]  T. Hanson,et al.  A class of mixtures of dependent tail-free processes. , 2011, Biometrika.

[11]  Suprateek Kundu,et al.  Bayes Variable Selection in Semiparametric Linear Models , 2011, Journal of the American Statistical Association.

[12]  A. Kottas,et al.  A Bayesian Nonparametric Approach to Inference for Quantile Regression , 2010 .

[13]  Jayanta K. Ghosh,et al.  Bayesian density regression with logistic Gaussian process and subspace projection , 2010 .

[14]  Trevor Hastie,et al.  LOCAL CASE-CONTROL SAMPLING: EFFICIENT SUBSAMPLING IN IMBALANCED DATA SETS. , 2013, Annals of statistics.

[15]  Ann B. Lee,et al.  Nonparametric Conditional Density Estimation in a High-Dimensional Regression Setting , 2016, 1604.00540.

[16]  Jian Sun,et al.  Delving Deep into Rectifiers: Surpassing Human-Level Performance on ImageNet Classification , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[17]  Nicolai Meinshausen,et al.  Quantile Regression Forests , 2006, J. Mach. Learn. Res..

[18]  Rafael Izbicki,et al.  Conditional Density Estimation Tools in Python and R with Applications to Photometric Redshifts and Likelihood-Free Cosmological Inference , 2019 .

[19]  J. Kadane,et al.  Simultaneous Linear Quantile Regression: A Semiparametric Bayesian Approach , 2012 .

[20]  Babak Shahbaba,et al.  Nonlinear Models Using Dirichlet Process Mixtures , 2007, J. Mach. Learn. Res..

[21]  N. Pillai,et al.  Bayesian density regression , 2007 .

[22]  B. Mallick,et al.  A Conditional Density Estimation Partition Model Using Logistic Gaussian Processes. , 2017, Biometrika.

[23]  R. L. Winkler,et al.  Scoring Rules for Continuous Probability Distributions , 1976 .

[24]  Lorenzo Trippa,et al.  The multivariate beta process and an extension of the Polya tree model. , 2011, Biometrika.

[25]  Kurt Hornik,et al.  Multilayer feedforward networks are universal approximators , 1989, Neural Networks.

[26]  Rui Li,et al.  Deep Distribution Regression , 2019, Comput. Stat. Data Anal..

[27]  D. Dunson,et al.  Nonparametric Bayes Conditional Distribution Modeling With Variable Selection , 2009, Journal of the American Statistical Association.

[28]  W. E. Lewis,et al.  A Feed Forward Neural Network Based on Model Output Statistics for Short-Term Hurricane Intensity Prediction , 2019, Weather and Forecasting.

[29]  Misha Pavel,et al.  Density Boosting for Gaussian Mixtures , 2004, ICONIP.

[30]  Nguyen Thanh Tung,et al.  Bias-corrected Quantile Regression Forests for high-dimensional data , 2014, 2014 International Conference on Machine Learning and Cybernetics.

[31]  H. Hersbach Decomposition of the Continuous Ranked Probability Score for Ensemble Prediction Systems , 2000 .

[32]  S. Efromovich Orthogonal series density estimation , 2010 .

[33]  C. Thane,et al.  Conditional Gaussian mixture modelling for dietary pattern analysis , 2007 .

[34]  J. Geweke,et al.  Smoothly mixing regressions , 2007 .

[35]  Rob J Hyndman,et al.  Estimating and Visualizing Conditional Densities , 1996 .

[36]  Peter J. Diggle,et al.  Estimation of Spatial Variation in Risk Using Matched Case‐control Data , 2002 .

[37]  T. Hastie,et al.  Finite-Sample Equivalence in Statistical Models for Presence-Only Data. , 2012, The annals of applied statistics.

[38]  P. Lenk The Logistic Normal Distribution for Bayesian, Nonparametric, Predictive Densities , 1988 .

[39]  Qiwei Yao,et al.  Approximating conditional distribution functions using dimension reduction , 2005 .

[40]  Jeffrey S. Racine,et al.  Cross-Validation and the Estimation of Conditional Probability Densities , 2004 .

[41]  Warren B. Powell,et al.  Dirichlet Process Mixtures of Generalized Linear Models , 2009, J. Mach. Learn. Res..

[42]  Sally Wood,et al.  Bayesian mixture of splines for spatially adaptive nonparametric regression , 2002 .

[43]  Trevor Hastie,et al.  Regularization Paths for Generalized Linear Models via Coordinate Descent. , 2010, Journal of statistical software.

[44]  Nguyen Thanh Tung,et al.  Extensions to Quantile Regression Forests for Very High-Dimensional Data , 2014, PAKDD.

[45]  Rafael Izbicki,et al.  Converting High-Dimensional Regression to High-Dimensional Conditional Density Estimation , 2017, 1704.08095.