From calibration to parameter learning: Harnessing the scaling effects of big data in geoscientific modeling

The behaviors and skills of models in many geosciences (e.g., hydrology and ecosystem sciences) strongly depend on spatially-varying parameters that need calibration. A well-calibrated model can reasonably propagate information from observations to unobserved variables via model physics, but traditional calibration is highly inefficient and results in non-unique solutions. Here we propose a novel differentiable parameter learning (dPL) framework that efficiently learns a global mapping between inputs (and optionally responses) and parameters. Crucially, dPL exhibits beneficial scaling curves not previously demonstrated to geoscientists: as training data increases, dPL achieves better performance, more physical coherence, and better generalizability (across space and uncalibrated variables), all with orders-of-magnitude lower computational cost. We demonstrate examples that learned from soil moisture and streamflow, where dPL drastically outperformed existing evolutionary and regionalization methods, or required only ~12.5% of the training data to achieve similar performance. The generic scheme promotes the integration of deep learning and process-based models, without mandating reimplementation. Much effort is invested in calibrating model parameters for accurate outputs, but established methods can be inefficient and generic. By learning from big dataset, a new differentiable framework for model parameterization outperforms state-of-the-art methods, produce more physically-coherent results, using a fraction of the training data, computational power, and time. The method promotes a deep integration of machine learning with process-based geoscientific models.

[1]  Kalyanmoy Deb,et al.  A fast and elitist multiobjective genetic algorithm: NSGA-II , 2002, IEEE Trans. Evol. Comput..

[2]  Li Liu,et al.  A surrogate model for the Variable Infiltration Capacity model using deep learning artificial neural network , 2020 .

[3]  Geoffrey E. Hinton,et al.  Deep Learning , 2015, Nature.

[4]  R. Storn,et al.  Differential Evolution: A Practical Approach to Global Optimization (Natural Computing Series) , 2005 .

[5]  G. Salvucci,et al.  Plant functional traits and climate influence drought intensification and land–atmosphere feedbacks , 2019, Proceedings of the National Academy of Sciences.

[6]  Jürgen Schmidhuber,et al.  Deep learning in neural networks: An overview , 2014, Neural Networks.

[7]  Russell S. Peak,et al.  Part 2: , 2020, Journal of Neural Transmission.

[8]  Martyn P. Clark,et al.  The CAMELS data set: catchment attributes and meteorology for large-sample studies , 2017 .

[9]  Chaopeng Shen,et al.  From Hydrometeorology to River Water Quality: Can a Deep Learning Model Predict Dissolved Oxygen at the Continental Scale? , 2021, Environmental science & technology.

[10]  S. Attinger,et al.  Multiscale parameter regionalization of a grid‐based hydrologic model at the mesoscale , 2010 .

[11]  Anuj Karpatne,et al.  Process‐Guided Deep Learning Predictions of Lake Water Temperature , 2019, Water Resources Research.

[12]  E. Wood,et al.  In Quest of Calibration Density and Consistency in Hydrologic Modeling: Distributed Parameter Calibration against Streamflow Characteristics , 2018, Water Resources Research.

[13]  Mike Innes,et al.  Flux: Elegant machine learning with Julia , 2018, J. Open Source Softw..

[14]  Chaopeng Shen,et al.  Full‐flow‐regime storage‐streamflow correlation patterns provide insights into hydrologic functioning over the continental US , 2017 .

[15]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[16]  J. Houghton,et al.  Climate change 2001 : the scientific basis , 2001 .

[17]  Yoshua Bengio,et al.  Gradient Flow in Recurrent Nets: the Difficulty of Learning Long-Term Dependencies , 2001 .

[18]  Chaopeng Shen,et al.  Enhancing Streamflow Forecast and Extracting Insights Using Long‐Short Term Memory Networks With Data Integration at Continental Scales , 2019, Water Resources Research.

[19]  M H Barendrecht,et al.  The Value of Empirical Data for Estimating the Parameters of a Sociohydrological Flood Risk Model , 2019, Water resources research.

[20]  B. Cosgrove,et al.  A Multiscale, Hydrometeorological Forecast Evaluation of National Water Model Forecasts of the May 2018 Ellicott City, Maryland, Flood , 2020, Journal of Hydrometeorology.

[21]  A. Dijk,et al.  Global Fully Distributed Parameter Regionalization Based on Observed Streamflow From 4,229 Headwater Catchments , 2019, Journal of Geophysical Research: Atmospheres.

[22]  Guigang Zhang,et al.  Deep Learning , 2016, Int. J. Semantic Comput..

[23]  Xiao Yang,et al.  Prolongation of SMAP to Spatiotemporally Seamless Coverage of Continental U.S. Using a Deep Learning Neural Network , 2017, 1707.06611.

[24]  Yongqiang Zhang,et al.  Regionalization of hydrological modeling for predicting streamflow in ungauged catchments: A comprehensive review , 2020, WIREs Water.

[25]  Aslam Muhammad,et al.  Calibration and validation of APSIM-Wheat and CERES-Wheat for spring wheat under rainfed conditions: Models evaluation and application , 2016, Comput. Electron. Agric..

[26]  Thomas Meixner,et al.  A global and efficient multi-objective auto-calibration and uncertainty estimation method for water quality catchment models , 2007 .

[27]  N. Bynagari The Difficulty of Learning Long-Term Dependencies with Gradient Flow in Recurrent Nets , 2020, Engineering International.

[28]  S. Sorooshian,et al.  Calibration of a semi-distributed hydrologic model for streamflow estimation along a river system , 2004, Journal of Hydrology.

[29]  Hoshin Vijai Gupta,et al.  A spatial regularization approach to parameter estimation for a distributed watershed model , 2008 .

[30]  P. Mahadevan,et al.  An overview , 2007, Journal of Biosciences.

[31]  Chaopeng Shen,et al.  A Transdisciplinary Review of Deep Learning Research and Its Relevance for Water Resources Scientists , 2017, Water Resources Research.

[32]  E. Wood,et al.  Four decades of microwave satellite soil moisture observations: Part 2. Product validation and inter-satellite comparisons , 2017 .

[33]  H. Hendricks Franssen,et al.  Estimation of Community Land Model parameters for an improved assessment of net carbon fluxes at European sites , 2017 .

[34]  Chaopeng Shen,et al.  Near-Real-Time Forecast of Satellite-Based Soil Moisture Using Long Short-Term Memory with an Adaptive Data Integration Kernel , 2020 .

[35]  Natalia Gimelshein,et al.  PyTorch: An Imperative Style, High-Performance Deep Learning Library , 2019, NeurIPS.

[36]  Magnus Strand,et al.  COMPARISON AND ANALYSIS , 2017 .

[37]  Martyn P. Clark,et al.  mizuRoute version 1: a river network routing tool for a continental domain water resources applications , 2015 .

[38]  Avi Ostfeld,et al.  Evolutionary algorithms and other metaheuristics in water resources: Current status, research challenges and future directions , 2014, Environ. Model. Softw..

[39]  Markus Reichstein,et al.  Physics‐Constrained Machine Learning of Evapotranspiration , 2019, Geophysical Research Letters.

[40]  Chaopeng Shen,et al.  The Value of SMAP for Long-Term Soil Moisture Estimation With the Help of Deep Learning , 2019, IEEE Transactions on Geoscience and Remote Sensing.

[41]  D. Lettenmaier,et al.  A simple hydrologically based model of land surface water and energy fluxes for general circulation models , 1994 .

[42]  Soroosh Sorooshian,et al.  General Review of Rainfall-Runoff Modeling: Model Calibration, Data Assimilation, and Uncertainty Analysis , 2009 .

[43]  Eric F. Wood,et al.  An efficient calibration method for continental‐scale land surface modeling , 2008 .

[44]  Yiqi Luo,et al.  Model parameterization to represent processes at unresolved scales and changing properties of evolving systems , 2019, Global change biology.

[45]  Luis Samaniego,et al.  Towards seamless large‐domain parameter estimation for hydrologic models , 2017 .

[46]  Kuolin Hsu,et al.  HESS Opinions: Incubating deep-learning-powered hydrologic science advances as a community , 2018, Hydrology and Earth System Sciences.

[47]  Neil McIntyre,et al.  Towards reduced uncertainty in conceptual rainfall‐runoff modelling: dynamic identifiability analysis , 2003 .

[48]  Jiancheng Shi,et al.  The Soil Moisture Active Passive (SMAP) Mission , 2010, Proceedings of the IEEE.

[49]  R. Koster,et al.  Global Soil Moisture from Satellite Observations, Land Surface Models, and Ground Data: Implications for Data Assimilation , 2004 .

[50]  Keith Beven,et al.  A manifesto for the equifinality thesis , 2006 .

[51]  Joachim Denzler,et al.  Deep learning and process understanding for data-driven Earth system science , 2019, Nature.

[52]  Steven M. Quiring,et al.  Comparison of NLDAS-2 Simulated and NASMD Observed Daily Soil Moisture. Part I: Comparison and Analysis , 2015 .

[53]  anonymous,et al.  Comprehensive review , 2019 .

[54]  S. Sorooshian,et al.  Effective and efficient global optimization for conceptual rainfall‐runoff models , 1992 .

[55]  Qingyun Duan,et al.  Three decades of the Shuffled Complex Evolution (SCE-UA) optimization algorithm: Review and applications , 2019 .

[56]  Mario Putti,et al.  Physically based modeling in catchment hydrology at 50: Survey and outlook , 2015 .

[57]  S. Oliver,et al.  Exploring the exceptional performance of a deep learning stream temperature model and the value of streamflow data , 2020, Environmental Research Letters.