Towards Trustworthy and Interpretable Deep Learning-assisted Ecohydrological Models

February 12, 2021 1 Authors Peishi Jiang1, Xingyuan Chen1, Maruti K. Mudunuru1, Praveen Kumar2, Pin Shuai1, Kyongho Son1, Alexander Sun3 1Pacific Northwest National Laboratory, Richland, WA 99352, USA 2University of Illinois at Urbana-Champaign, Champaign, IL 61801, USA 3The University of Texas at Austin, Austin, TX 78712, USA Focal Area(s) Insights gleaned from complex data (both observed and simulated) using AI, big data analytics, and other advanced methods, including explainable AI and physicsor knowledge-guided AI. Science Challenge The transformational science question we plan to address is: How do we leverage in-situ observations and simulations from process-based ecohydrological model to construct interpretable and trustworthy deep learning (DL) models for improved reliability of prediction of quantities of interest (QoIs) under hydro-climatic extremes? Rationale Research needs or gaps: Watershed responses are driven by atmospheric forcing through landatmosphere coupling as well as integrated surface and subsurface ecohydrological processes. Under global warming, they are leading to more extreme events1 such as floods or droughts, as well as variability shift of high frequency events2. In order to understand how increase in hydro-climatic extremes or altered variability impact watershed dynamics, a variety of ecohydrological models can be employed to simulate the watershed responses as QoIs (e.g., runoff, nutrient loading)3,4, with in-situ observations used for calibrating the models. Simulations from the calibrated process-based model provide new physical insights and further guide follow-up site sampling/measurement activities, leading to the Model-Experimental Coupling (ModEx) approach. Nevertheless, the traditional ModEx approach using process-based models requires numerous realizations for either calibration or uncertainty quantification, which would be computationally expensive, if not unaffordable, for a large-scale watershed simulation. Such computational demand further inhibits predictions to understand watershed dynamics under hydro-climatic extremes or disturbances. To address this problem, deep learning (DL)-based emulator can be an ideal alternative to process-based model for providing fast simulations5, thereby speeding up the ModEx life cycle. However, using DL techniques in ModEx poses the following challenges: 1. The first challenge is associated with the trustworthiness of the DL-based emulator6,7. That is, we need an emulator that is able to capture the dynamical interdependencies of the corresponding process-based model. A trustworthy emulator can not only provide accurate predictions but also facilitate identifying unrepresented dynamics using observations in the ModEx life cycle. However, the traditional way of training DL models8 usually does not explicitly account for the interactions between inputs and outputs, and thus the trustworthiness of the DL model is not guaranteed. 2. The second challenge is associated with the interpretability of the DL-based emulator9. That is, once the model is trained, we must interpret each individual prediction in order to understand how the input features (i.e., atmospheric forcing/model parameters) impact the predictions (i.e., simulated QoIs). The interpretation will help address issues such as to what extent a flooding event contributes to the downstream nutrient loading, thereby guiding the new sampling/measurement activities in the ModEx life cycle. However, the black box nature of DL models masks interpretable dependencies, and thus prevents such predictive understanding on simulated QoIs.

[1]  Rushil Anirudh,et al.  Improved surrogates in inertial confinement fusion with manifold and cycle consistencies , 2019, Proceedings of the National Academy of Sciences.

[2]  Luís Torgo,et al.  SMOGN: a Pre-processing Approach for Imbalanced Regression , 2017, LIDTA@PKDD/ECML.

[3]  P. Shuai,et al.  Dam Operations and Subsurface Hydrogeology Control Dynamics of Hydrologic Exchange Flows in a Regulated River Reach , 2019, Water Resources Research.

[4]  Praveen Kumar,et al.  Patterns of change in high frequency precipitation variability over North America , 2017, Scientific Reports.

[5]  Peishi Jiang,et al.  Information transfer from causal history in complex system dynamics. , 2019, Physical review. E.

[6]  Anuj Karpatne,et al.  Physics-Guided Machine Learning for Scientific Discovery: An Application in Simulating Lake Temperature Profiles , 2020, Trans. Data Sci..

[7]  Max Welling,et al.  Semi-Supervised Classification with Graph Convolutional Networks , 2016, ICLR.

[8]  Marc Bocquet,et al.  Combining data assimilation and machine learning to emulate a dynamical model from sparse and noisy observations: a case study with the Lorenz 96 model , 2019, J. Comput. Sci..

[9]  E. Fischer,et al.  Frequency of extreme precipitation increases extensively with event rareness under global warming , 2019, Scientific Reports.

[10]  Alexander Binder,et al.  On Pixel-Wise Explanations for Non-Linear Classifier Decisions by Layer-Wise Relevance Propagation , 2015, PloS one.

[11]  Wojciech Samek,et al.  Explainable AI: Interpreting, Explaining and Visualizing Deep Learning , 2019, Explainable AI.

[12]  Shuzhen Yao,et al.  Neural Stochastic Differential Equations with Neural Processes Family Members for Uncertainty Estimation in Deep Learning , 2021, Sensors.

[13]  Praveen Kumar,et al.  Debates—Does Information Theory Provide a New Paradigm for Earth Science? Causality, Interaction, and Feedback , 2020, Water Resources Research.

[14]  Matthew Hutson AI shortcuts speed up simulations by billions of times. , 2020, Science.

[15]  Christoph Kelp,et al.  Trustworthy artificial intelligence , 2023, Asian Journal of Philosophy.

[16]  Carlos Guestrin,et al.  "Why Should I Trust You?": Explaining the Predictions of Any Classifier , 2016, ArXiv.

[17]  Assessment of future climate change impacts on nonpoint source pollution in snowmelt period for a cold area using SWAT , 2018, Scientific Reports.

[18]  Paris Perdikaris,et al.  Physics Informed Deep Learning (Part II): Data-driven Discovery of Nonlinear Partial Differential Equations , 2017, ArXiv.

[19]  Luciano Floridi,et al.  Establishing the rules for building trustworthy AI , 2019, Nat. Mach. Intell..