A Machine Learning Metasystem for Robust Probabilistic Nonlinear Regression-Based Forecasting of Seasonal Water Availability in the US West

Hydroelectric power generation, water supplies for municipal, agricultural, manufacturing, and service industry uses including technology-sector requirements, dam safety, flood control, recreational uses, and ecological and legal constraints, all place simultaneous, competing demands on the heavily stressed water management infrastructure of the mostly arid American West. Optimally managing these resources depends on predicting water availability. We built a probabilistic nonlinear regression water supply forecast (WSF) technique for the US Department of Agriculture, which runs the largest stand-alone WSF system in the US West. Design criteria included improved accuracy over the existing system; uncertainty estimates that seamlessly handle complex (heteroscedastic, non-Gaussian) prediction errors; integration of physical hydrometeorological process knowledge and domain-specific expert experience; ability to accommodate nonlinearity, model selection uncertainty and equifinality, and predictor multicollinearity and high dimensionality; and relatively easy, low-cost implementation. Some methods satisfied some of these requirements but none met all, leading us to develop a novel, interdisciplinary, and pragmatic prediction metasystem through a carefully considered synthesis of well-established, off-the-shelf components and approaches, spanning supervised and unsupervised machine learning, nonparametric statistical modeling, ensemble learning, and evolutionary optimization, focusing on maintaining but radically updating the principal components regression framework widely used for WSF. Testing this integrated multi-method prediction engine demonstrated its value for river forecasting; USDA adoption is a landmark for transitioning machine learning from research into practice in this field. Its ability to handle all the foregoing design criteria and requirements, which are not unique to WSF, suggests potential for extension to complex probabilistic prediction problems in other fields.

[1]  Adrian E. Raftery,et al.  Bayesian Model Averaging: A Tutorial , 2016 .

[2]  T. Gan,et al.  Incorporation of seasonal climate forecasts in the ensemble streamflow prediction system. , 2010 .

[3]  A. W. Minns,et al.  Artificial neural networks as rainfall-runoff models , 1996 .

[4]  Jasper A. Vrugt,et al.  Semi-distributed parameter optimization and uncertainty assessment for large-scale streamflow simulation using global optimization / Optimisation de paramètres semi-distribués et évaluation de l'incertitude pour la simulation de débits à grande échelle par l'utilisation d'une optimisation globale , 2008 .

[5]  Donald J. Druce Incorporating daily flood control objectives into a monthly stochastic dynamic programing model for a hydroelectric complex , 1990 .

[6]  Ulrich Anders,et al.  Model selection in neural networks , 1999, Neural Networks.

[7]  Muneeswaran Karuppiah,et al.  Facial Emotion Recognition Based on Eye and Mouth Regions , 2016, Int. J. Pattern Recognit. Artif. Intell..

[8]  Fotios Petropoulos,et al.  forecast: Forecasting functions for time series and linear models , 2018 .

[9]  Jianming Ye On Measuring and Correcting the Effects of Data Mining and Model Selection , 1998 .

[10]  Leo Breiman,et al.  Bagging Predictors , 1996, Machine Learning.

[11]  Alex J. Cannon,et al.  Non-crossing nonlinear regression quantiles by monotone composite quantile regression neural network, with application to rainfall extremes , 2018, Stochastic Environmental Research and Risk Assessment.

[12]  Abhimanyu Das,et al.  Algorithms for subset selection in linear regression , 2008, STOC.

[13]  Robert L. Winkler,et al.  Combining Probability Distributions From Experts in Risk Analysis , 1999 .

[14]  Dave Campbell,et al.  Development and Operational Testing of a Super‐Ensemble Artificial Intelligence Flood‐Forecast Model for a Pacific Northwest River , 2015 .

[15]  Vincent Calcagno,et al.  glmulti: An R Package for Easy Automated Model Selection with (Generalized) Linear Models , 2010 .

[16]  William W. Hsieh,et al.  Seasonal Prediction with Error Estimation of Columbia River Streamflow in British Columbia , 2003 .

[17]  Kenneth W. Lamb,et al.  Using large‐scale climatic patterns for improving long lead time streamflow forecasts for Gunnison and San Juan River Basins , 2013 .

[18]  Tie Qiu,et al.  Remote Sensing Image Classification Based on Ensemble Extreme Learning Machine With Stacked Autoencoder , 2017, IEEE Access.

[19]  Zhi-Hua Zhou,et al.  Ensemble Methods: Foundations and Algorithms , 2012 .

[20]  Marie-Amélie Boucher,et al.  A Stochastic Data‐Driven Ensemble Forecasting Framework for Water Resources: A Case Study Using Ensemble Members Derived From a Database of Deterministic Wavelet‐Based Models , 2019, Water Resources Research.

[21]  Aranildo R. Lima,et al.  Variable complexity online sequential extreme learning machine, with applications to streamflow prediction , 2017 .

[22]  D. Garen,et al.  Innovative operational seasonal water supply forecasting technologies , 2009, Journal of Soil and Water Conservation.

[23]  S. Amari,et al.  Network Information Criterion | Determining the Number of Hidden Units for an Articial Neural Network Model Network Information Criterion | Determining the Number of Hidden Units for an Articial Neural Network Model , 2007 .

[24]  David Garen,et al.  Invited Commentary: Themes and Issues from the Workshop Operational River Flow and Water Supply Forecasting , 2012 .

[25]  Andy Liaw,et al.  Classification and Regression by randomForest , 2007 .

[26]  Renate Hagedorn,et al.  The rationale behind the success of multi-model ensembles in seasonal forecasting — I. Basic concept , 2005 .

[27]  Sean W. Fleming,et al.  Artificial neural network forecasting of nonlinear Markov processes , 2007 .

[28]  Zhen Zhang,et al.  Feedforward networks with monotone constraints , 1999, IJCNN'99. International Joint Conference on Neural Networks. Proceedings (Cat. No.99CH36339).

[29]  P. Coulibaly,et al.  Two decades of anarchy? Emerging themes and outstanding challenges for neural network river forecasting , 2012 .

[30]  Shikha Mehta,et al.  A comparative study of ensemble learning methods for classification in bioinformatics , 2017, 2017 7th International Conference on Cloud Computing, Data Science & Engineering - Confluence.

[31]  David H. Wolpert,et al.  The Lack of A Priori Distinctions Between Learning Algorithms , 1996, Neural Computation.

[32]  D. Garen Improved Techniques in Regression‐Based Streamflow Volume Forecasting , 1992 .

[33]  M. Peruggia Model Selection and Multimodel Inference: A Practical Information-Theoretic Approach (2nd ed.) , 2003 .

[34]  Naomi S. Altman,et al.  Quantile regression , 2019, Nature Methods.

[35]  I. Jolliffe A Note on the Use of Principal Components in Regression , 1982 .

[36]  Kelvin Balcombe,et al.  Model Selection Using Information Criteria and Genetic Algorithms , 2005 .

[37]  Kristen D. Splinter,et al.  Bayesian Networks in coastal engineering: Distinguishing descriptive and predictive applications , 2018 .

[38]  Alex J. Cannon,et al.  A graphical sensitivity analysis for statistical climate models: application to Indian monsoon rainfall prediction by artificial neural networks and multiple linear regression models , 2002 .

[39]  Dennis P. Lettenmaier,et al.  Economic Value of Long-Lead Streamflow Forecasts for Columbia River Hydropower , 2002 .

[40]  Francesco Falciani,et al.  GALGO: an R package for multivariate variable selection using genetic algorithms , 2006, Bioinform..

[41]  Quan J. Wang,et al.  Improving statistical forecasts of seasonal streamflows using hydrological model output , 2012 .

[42]  Bin Jiang,et al.  Evolutionary Ensemble Learning Algorithm to Modeling of Warfarin Dose Prediction for Chinese , 2019, IEEE Journal of Biomedical and Health Informatics.

[43]  Dong-Sheng Cao,et al.  Support Vector Machines and Their Application in Chemistry and Biotechnology , 2011 .

[44]  N. Voisin,et al.  Compound climate events transform electrical power shortfall risk in the Pacific Northwest , 2019, Nature Communications.

[45]  Mohammad Najafi,et al.  Ensemble Combination of Seasonal Streamflow Forecasts , 2016 .

[46]  Apostolos Burnetas,et al.  Hydrological post-processing using stacked generalization of quantile regression algorithms: Large-scale application over CONUS , 2019, Journal of Hydrology.