Reconstruction of GRACE Total Water Storage Through Automated Machine Learning

The Gravity Recovery and Climate Experiment (GRACE) satellite mission and its follow‐on, GRACE‐FO, have provided unprecedented opportunities to quantify the impact of climate extremes and human activities on total water storage at large scales. The ∼1‐year data gap between the two GRACE missions needs to be filled to maintain data continuity and maximize mission benefits. In this study, we applied an automated machine learning (AutoML) workflow to perform gridwise GRACE‐like data reconstruction. AutoML represents a new paradigm for optimal algorithm selection, model structure selection, and hyperparameter tuning, addressing some of the most challenging issues in machine learning applications. We demonstrated the workflow over the conterminous U.S. (CONUS) using six types of machine learning models and multiple groups of meteorological and climatic variables as predictors. Results indicate that the AutoML‐assisted gap filling achieved satisfactory performance over the CONUS. On the testing data, the mean gridwise Nash‐Sutcliffe efficiency is around 0.85, the mean correlation coefficient is around 0.95, and the mean normalized root‐mean‐square‐error is about 0.09. Trained models maintain good performance when extrapolating to the mission gap and to GRACE‐FO periods (after June 2017). Results further suggest that no single algorithm provides the best predictive performance over the entire CONUS, stressing the importance of using an end‐to‐end workflow to train, optimize, and combine multiple machine learning models to deliver robust performance, especially when building large‐scale hydrological prediction systems and when predictor importance exhibiting strong spatial variability.

[1]  A. Sun,et al.  Downscaling Satellite and Reanalysis Precipitation Products Using Attention-Based Deep Convolutional Neural Nets , 2020, Frontiers in Water.

[2]  L. Di,et al.  A data-driven approach to generate past GRACE-like terrestrial water storage solution by calibrating the land surface model simulations , 2020 .

[3]  J. Szilágyi Water Balance Backward: Estimation of Annual Watershed Precipitation and Its Long-Term Trend with the Help of the Calibration-Free Generalized Complementary Relationship of Evaporation , 2020, Water.

[4]  Yu Zhang,et al.  An Iterative ICA-Based Reconstruction Method to Produce Consistent Time-Variable Total Water Storage Fields Using GRACE and Swarm Satellite Data , 2020, Remote. Sens..

[5]  A. Mariotti,et al.  Windows of Opportunity for Skillful Forecasts Subseasonal to Seasonal and Beyond , 2020, Bulletin of the American Meteorological Society.

[6]  Ehsan Forootan,et al.  Comparison of Data‐Driven Techniques to Reconstruct (1992–2002) and Predict (2017–2018) GRACE‐Like Gridded Total Water Storage Changes Using Climate Inputs , 2020, Water Resources Research.

[7]  D. Long,et al.  Reconstruction of GRACE Data on Changes in Total Water Storage Over the Global Land Surface and 60 Basins , 2020, Water Resources Research.

[8]  M. Wehner,et al.  Forecasted attribution of the human influence on Hurricane Florence , 2020, Science Advances.

[9]  R. Fu,et al.  A process-based statistical seasonal prediction of May–July rainfall anomalies over Texas and the Southern Great Plains of the United States , 2019 .

[10]  Reza Farivar,et al.  Towards Automated Machine Learning: Evaluation and Comparison of AutoML Approaches and Tools , 2019, 2019 IEEE 31st International Conference on Tools with Artificial Intelligence (ICTAI).

[11]  Cees G. M. Snoek,et al.  Variable Selection , 2019, Model-Based Clustering and Classification for Data Science.

[12]  Mohamed Sultan,et al.  Forecasting GRACE Data over the African Watersheds Using Artificial Neural Networks , 2019, Remote. Sens..

[13]  Alexander Y. Sun,et al.  How can Big Data and machine learning benefit environment and water management: a survey of methods, applications, and future directions , 2019, Environmental Research Letters.

[14]  Frank Flechtner,et al.  Contributions of GRACE to understanding climate change , 2019, Nature Climate Change.

[15]  Vincent Humphrey,et al.  GRACE-REC: a reconstruction of climate-driven water storage changes over the last century , 2019, Earth System Science Data.

[16]  C. Shum,et al.  Understanding the global hydrological droughts of 2003-2016 and their relationships with teleconnections. , 2019, The Science of the total environment.

[17]  Joachim Denzler,et al.  Deep learning and process understanding for data-driven Earth system science , 2019, Nature.

[18]  David Walling,et al.  Combining Physically Based Modeling and Deep Learning for Fusing GRACE Satellite Data: Can We Learn From Mismatch? , 2019, Water Resources Research.

[19]  Karsten Schulz,et al.  Rainfall–runoff modelling using Long Short-Term Memory (LSTM) networks , 2018, Hydrology and Earth System Sciences.

[20]  Jay Lund,et al.  Lessons from California’s 2012–2016 Drought , 2018, Journal of Water Resources Planning and Management.

[21]  Karl Pfeiffer,et al.  Improving Subseasonal Forecasting in the Western U.S. with Machine Learning , 2018, KDD.

[22]  Martha C. Anderson,et al.  Groundwater Withdrawals Under Drought: Reconciling GRACE and Land Surface Models in the United States High Plains Aquifer , 2018, Water Resources Research.

[23]  Qingquan Song,et al.  Auto-Keras: An Efficient Neural Architecture Search System , 2018, KDD.

[24]  F. Landerer,et al.  Emerging trends in global freshwater availability , 2018, Nature.

[25]  J. Halverson The Costliest Hurricane Season in U.S. History , 2018 .

[26]  Alexander Y. Sun,et al.  Patterns of precipitation and soil moisture extremes in Texas, US: A complex network analysis , 2018 .

[27]  R. Reedy,et al.  Global models underestimate large decadal declining and rising water storage trends relative to GRACE satellite data , 2018, Proceedings of the National Academy of Sciences.

[28]  Florian Pappenberger,et al.  Mitigating the Impacts of Climate Nonstationarity on Seasonal Streamflow Predictability in the U.S. Southwest , 2017 .

[29]  Alexander Y. Sun,et al.  Using GRACE Satellite Gravimetry for Assessing Large-Scale Hydrologic Extremes , 2017, Remote. Sens..

[30]  Chaopeng Shen,et al.  A Transdisciplinary Review of Deep Learning Research and Its Relevance for Water Resources Scientists , 2017, Water Resources Research.

[31]  Xiao Yang,et al.  Prolongation of SMAP to Spatiotemporally Seamless Coverage of Continental U.S. Using a Deep Learning Neural Network , 2017, 1707.06611.

[32]  Soroosh Sorooshian,et al.  Developing reservoir monthly inflow forecasts using artificial intelligence and climate phenomenon information , 2017 .

[33]  Vincent Humphrey,et al.  A global reconstruction of climate‐driven subdecadal water storage variability , 2017 .

[34]  Bailing Li,et al.  Comparison and Assessment of Three Advanced Land Surface Models in Simulating Terrestrial Water Storage Components over the United States , 2017 .

[35]  Scott C. Worland,et al.  Improving predictions of hydrological low-flow indices in ungaged basins using machine learning , 2016, Environ. Model. Softw..

[36]  M. Watkins,et al.  Quantifying and reducing leakage errors in the JPL RL05M GRACE mascon solution , 2016 .

[37]  M. Rodell,et al.  Assimilation of gridded terrestrial water storage observations from GRACE into a land surface model , 2016 .

[38]  Randal S. Olson,et al.  Evaluation of a Tree-based Pipeline Optimization Tool for Automating Data Science , 2016, GECCO.

[39]  Tianqi Chen,et al.  XGBoost: A Scalable Tree Boosting System , 2016, KDD.

[40]  Aaron Klein,et al.  Efficient and Robust Automated Machine Learning , 2015, NIPS.

[41]  Alexander Y. Sun,et al.  Model Calibration and Parameter Estimation: For Environmental and Water Resource Systems , 2015 .

[42]  M. Watkins,et al.  Improved methods for observing Earth's time variable mass distribution with GRACE using spherical cap mascons , 2015 .

[43]  Yang Hong,et al.  Drought and flood monitoring for a large karst plateau in Southwest China using extended GRACE data , 2014 .

[44]  Paresh Chandra Deka,et al.  Support vector machine applications in the field of hydrology: A review , 2014, Appl. Soft Comput..

[45]  Alexander Y. Sun,et al.  Monthly streamflow forecasting using Gaussian Process Regression , 2014 .

[46]  M. Hoerling,et al.  Causes and Predictability of the 2012 Great Plains Drought , 2014 .

[47]  Alois Knoll,et al.  Gradient boosting machines, a tutorial , 2013, Front. Neurorobot..

[48]  Alexander Y. Sun,et al.  Predicting groundwater level changes using GRACE data , 2013 .

[49]  Paulin Coulibaly,et al.  Streamflow Prediction in Ungauged Basins: Review of Regionalization Methods , 2013 .

[50]  R. Nerem,et al.  The influence of ENSO on global terrestrial water storage using GRACE , 2012 .

[51]  Kevin Leyton-Brown,et al.  Auto-WEKA: combined selection and hyperparameter optimization of classification algorithms , 2012, KDD.

[52]  Yi Liu,et al.  A three-dimensional gap filling method for large geophysical datasets: Application to global satellite soil moisture observations , 2012, Environ. Model. Softw..

[53]  F. Landerer,et al.  Accuracy of scaled GRACE terrestrial water storage estimates , 2012 .

[54]  Yoshua Bengio,et al.  Random Search for Hyper-Parameter Optimization , 2012, J. Mach. Learn. Res..

[55]  N. Speybroeck Classification and regression trees , 2012, International Journal of Public Health.

[56]  Gaël Varoquaux,et al.  Scikit-learn: Machine Learning in Python , 2011, J. Mach. Learn. Res..

[57]  Nando de Freitas,et al.  A Tutorial on Bayesian Optimization of Expensive Cost Functions, with Application to Active User Modeling and Hierarchical Reinforcement Learning , 2010, ArXiv.

[58]  Anny Cazenave,et al.  Past terrestrial water storage (1980–2008) in the Amazon Basin reconstructed from GRACE and in situ river gauging data , 2010 .

[59]  Ian H. Witten,et al.  The WEKA data mining software: an update , 2009, SKDD.

[60]  Max Kuhn,et al.  Building Predictive Models in R Using the caret Package , 2008 .

[61]  M. Rodell,et al.  Assimilation of GRACE Terrestrial Water Storage Data into a Land Surface Model: Results for the Mississippi River Basin , 2008 .

[62]  Thomas M. Smith,et al.  Improvements to NOAA’s Historical Merged Land–Ocean Surface Temperature Analysis (1880–2006) , 2008 .

[63]  Nasser M. Nasrabadi,et al.  Pattern Recognition and Machine Learning , 2006, Technometrics.

[64]  D. Ernst,et al.  Extremely randomized trees , 2006, Machine Learning.

[65]  Russell G. Death,et al.  An accurate comparison of methods for quantifying variable importance in artificial neural networks using simulated data , 2004 .

[66]  M. Watkins,et al.  GRACE Measurements of Mass Variability in the Earth System , 2004, Science.

[67]  R. Tibshirani,et al.  The Elements of Statistical Learning: Data Mining, Inference, and Prediction , 2004 .

[68]  Jeffrey P. Walker,et al.  THE GLOBAL LAND DATA ASSIMILATION SYSTEM , 2004 .

[69]  Yen-Chang Chen,et al.  A counterpropagation fuzzy-neural network modeling approach to real time streamflow prediction , 2001 .

[70]  Praveen Kumar,et al.  A catchment‐based approach to modeling land surface processes in a general circulation model: 1. Model structure , 2000 .

[71]  Yoshua Bengio,et al.  Gradient-Based Optimization of Hyperparameters , 2000, Neural Computation.

[72]  Bernard Bobée,et al.  Daily reservoir inflow forecasting using artificial neural networks with stopped training approach , 2000 .

[73]  Charles A. Doswell,et al.  Precipitation Forecasting Using a Neural Network , 1999 .

[74]  T. Wigley,et al.  Statistical downscaling of general circulation model output: A comparison of methods , 1998 .

[75]  Thomas M. Smith,et al.  Specification and Prediction of Global Surface Temperature and Precipitation from Global SST Using CCA , 1996 .

[76]  T. Barnett,et al.  Origins and Levels of Monthly and Seasonal Forecast Skill for United States Surface Air Temperatures Determined by Canonical Correlation Analysis , 1987 .

[77]  E. LeDell,et al.  H2O AutoML: Scalable Automatic Machine Learning , 2020 .

[78]  Guigang Zhang,et al.  Deep Learning , 2016, Int. J. Semantic Comput..

[79]  Proceedings of the Genetic and Evolutionary Computation Conference 2016 , 2016, GECCO.

[80]  David R. Anderson,et al.  Understanding AIC and BIC in Model Selection , 2004 .

[81]  A. O'Hagan,et al.  Bayesian calibration of computer models , 2001 .

[82]  Holger R. Maier,et al.  Neural networks for the prediction and forecasting of water resource variables: a review of modelling issues and applications , 2000, Environ. Model. Softw..

[83]  George H. Taylor,et al.  The Prism Approach to Mapping Precipitation and Temperature , 1998 .

[84]  A. Raftery Bayesian Model Selection in Social Research , 1995 .