Application of Machine Learning Methods in Inferring Surface Water Groundwater Exchanges using High Temporal Resolution Temperature Measurements

We examine the ability of machine learning (ML) and deep learning (DL) algorithms to infer surface/ground exchange flux based on subsurface temperature observations. The observations and fluxes are produced from a high-resolution numerical model representing conditions in the Columbia River near the Department of Energy Hanford site located in southeastern Washington State. Random measurement error, of varying magnitude, is added to the synthetic temperature observations. The results indicate that both ML and DL methods can be used to infer the surface/ground exchange flux. DL methods, especially convolutional neural networks, outperform the ML methods when used to interpret noisy temperature data with a smoothing filter applied. However, the ML methods also performed well and they are can better identify a reduced number of important observations, which could be useful for measurement network optimization. Surprisingly, the ML and DL methods better inferred upward flux than downward flux. This is in direct contrast to previous findings using numerical models to infer flux from temperature observations and it may suggest that combined use of ML or DL inference with numerical inference could improve flux estimation beneath river systems.

[1]  Laurent de Vito,et al.  LinXGBoost: Extension of XGBoost to Generalized Local Linear Models , 2017, ArXiv.

[2]  A. Behrangi,et al.  2019–2020 Australia Fire and Its Relationship to Hydroclimatological and Vegetation Variabilities , 2020, Water.

[3]  A. Binley,et al.  Temporal and spatial variability of groundwater–surface water fluxes: Development and application of an analytical method using temperature time series , 2007 .

[4]  Tianqi Chen,et al.  XGBoost: A Scalable Tree Boosting System , 2016, KDD.

[5]  D. Rosenberry,et al.  A comparison of thermal infrared to fiber-optic distributed temperature sensing for evaluation of groundwater discharge to surface water. , 2015 .

[6]  Margaret J. Robertson,et al.  Design and Analysis of Experiments , 2006, Handbook of statistics.

[7]  S. M. Hashemy,et al.  Classification of aquifer vulnerability using K-means cluster analysis , 2017 .

[8]  Albert Fornells,et al.  A study of the effect of different types of noise on the precision of supervised learning techniques , 2010, Artificial Intelligence Review.

[9]  Milan Onderka,et al.  Seepage velocities derived from thermal records using wavelet analysis , 2013 .

[10]  N. P. López-Acosta Numerical and Analytical Methods for the Analysis of Flow of Water Through Soils and Earth Structures , 2016 .

[11]  S. Hochreiter,et al.  Toward Improved Predictions in Ungauged Basins: Exploiting the Power of Machine Learning , 2019, Water Resources Research.

[12]  Ali Ouni,et al.  Optimal Deep Learning LSTM Model for Electric Load Forecasting using Feature Selection and Genetic Algorithm: Comparison with Machine Learning Approaches † , 2018, Energies.

[13]  Richard Healy,et al.  1DTempPro: Analyzing Temperature Profiles for Groundwater/Surface‐water Exchange , 2014, Ground water.

[14]  L. Lautz Observing temporal patterns of vertical flux through streambed sediments using time-series analysis of temperature records , 2012 .

[15]  Laura K. Lautz,et al.  Using high‐resolution distributed temperature sensing to quantify spatial and temporal variability in vertical hyporheic flux , 2012 .

[16]  Jim Constantz,et al.  Heat as a tracer to determine streambed water exchanges , 2008 .

[17]  John T. Wilson,et al.  Using heat to characterize streambed water flux variability in four stream reaches. , 2008, Journal of environmental quality.

[18]  G. Huffman,et al.  Assessment of the Advanced Very High-Resolution Radiometer (AVHRR) for Snowfall Retrieval in High Latitudes Using CloudSat and Machine Learning , 2021, Journal of Hydrometeorology.

[19]  C. Schmidt,et al.  Estimation of vertical water fluxes from temperature time series by the inverse numerical computer program FLUX‐BOT , 2017 .

[20]  G. Vandersteen,et al.  Determining groundwater‐surface water exchange from temperature‐time series: Combining a local polynomial method with a maximum likelihood estimator , 2015 .

[21]  Chenhao Tan,et al.  Many Faces of Feature Importance: Comparing Built-in and Post-hoc Feature Importance in Text Classification , 2019, EMNLP/IJCNLP.

[22]  Richard W. Healy,et al.  Documentation of computer program VS2Dh for simulation of energy transport in variably saturated porous media; modification of the US Geological Survey's computer program VS2DT , 1996 .

[23]  P. Broxton,et al.  Assessment of Snowfall Accumulation from Satellite and Reanalysis Products Using SNOTEL Observations in Alaska , 2021, Remote. Sens..

[24]  Maoyi Huang,et al.  Coupling a three-dimensional subsurface flow and transport model with a land surface model to simulate stream-aquifer-land interactions (CP v1.0) , 2017 .

[25]  T. Gates,et al.  River GeoDSS for agroenvironmental enhancement of Colorado’s Lower Arkansas River Basin. I: Model development and calibration , 2010 .

[26]  Alexei Botchkarev,et al.  A New Typology Design of Performance Metrics to Measure Errors in Machine Learning Regression Algorithms , 2019, Interdisciplinary Journal of Information, Knowledge, and Management.

[27]  Chao Ma,et al.  Urban flooding risk assessment based on an integrated k-means cluster algorithm and improved entropy weight method in the region of Haikou, China , 2018, Journal of Hydrology.

[28]  M. Sophocleous Interactions between groundwater and surface water: the state of the science , 2002 .

[29]  Grey Nearing,et al.  Combining Parametric Land Surface Models with Machine Learning , 2020, ArXiv.

[30]  M. W. Beckera,et al.  Estimating flow and flux of ground water discharge using water temperature and velocity , 2004 .

[31]  Andrew T. Fisher,et al.  Quantifying surface water–groundwater interactions using time series analysis of streambed thermal records: Method development , 2006 .

[32]  P. Shuai,et al.  Using Ensemble Data Assimilation to Estimate Transient Hydrologic Exchange Flow Under Highly Dynamic Flow Conditions , 2019, Water Resources Research.

[33]  A. Behrangi,et al.  Global Intercomparison of Atmospheric Rivers Precipitation in Remote Sensing and Reanalysis Products , 2020, Journal of Geophysical Research: Atmospheres.

[34]  Daniel W. Apley,et al.  Visualizing the effects of predictor variables in black box supervised learning models , 2016, Journal of the Royal Statistical Society: Series B (Statistical Methodology).

[35]  T. Ferré,et al.  Effects of measurement resolution on the analysis of temperature time series for stream‐aquifer flux estimation , 2011 .

[36]  A Graph-Convolutional Neural Network Model for the Prediction of Chemical Reactivity , 2018 .

[37]  Martin S. Andersen,et al.  Assessing the accuracy of 1‐D analytical heat tracing for estimating near‐surface sediment thermal diffusivity and water flux under transient conditions , 2015 .

[38]  Haytham Assem,et al.  Urban Water Flow and Water Level Prediction Based on Deep Learning , 2017, ECML/PKDD.

[39]  K. P. Soman,et al.  Stock price prediction using LSTM, RNN and CNN-sliding window model , 2017, 2017 International Conference on Advances in Computing, Communications and Informatics (ICACCI).

[40]  B. Bischl,et al.  Quantifying Model Complexity via Functional Decomposition for Better Post-hoc Interpretability , 2019, PKDD/ECML Workshops.

[41]  A. Behrangi,et al.  On the Importance of Gauge-Undercatch Correction Factors and Their Impacts on the Global Precipitation Estimates , 2021 .

[42]  Taghi M. Khoshgoftaar,et al.  Deep learning applications and challenges in big data analytics , 2015, Journal of Big Data.

[43]  G E Hammond,et al.  Evaluating the performance of parallel subsurface simulators: An illustrative example with PFLOTRAN , 2014, Water resources research.

[44]  Sergey Plis,et al.  Deep Learning Applications for Predicting Pharmacological Properties of Drugs and Drug Repurposing Using Transcriptomic Data. , 2016, Molecular pharmaceutics.

[45]  Long Chen,et al.  Short-Term Load Forecasting Using EMD-LSTM Neural Networks with a Xgboost Algorithm for Feature Importance Evaluation , 2017 .

[46]  Kobus Barnard,et al.  NowCasting-Nets: Representation Learning to Mitigate Latency Gap of Satellite Precipitation Products Using Convolutional and Recurrent Neural Networks , 2021, IEEE Transactions on Geoscience and Remote Sensing.

[47]  Luigi Ferrigno,et al.  PSD estimation in Cognitive Radio systems: a performance analysis , 2013 .

[48]  Varun Singh,et al.  Estimation of models for cumulative infiltration of soil using machine learning methods , 2018, ISH Journal of Hydraulic Engineering.

[49]  Mohammad A. Moghaddam,et al.  Characterization of the Real Part of Dry Aerosol Refractive Index Over North America From the Surface to 12 km , 2018, Journal of Geophysical Research: Atmospheres.

[50]  Christopher M. Bishop,et al.  Current address: Microsoft Research, , 2022 .

[51]  Travis Esau,et al.  Groundwater Estimation from Major Physical Hydrology Components Using Artificial Neural Networks and Deep Learning , 2019 .

[52]  A. Behrangi,et al.  Comparative Assessment of Snowfall Retrieval From Microwave Humidity Sounders Using Machine Learning Methods , 2020, Earth and Space Science.

[53]  Ali Behrangi,et al.  Computing Accurate Probabilistic Estimates of One-D Entropy from Equiprobable Random Samples , 2021, Entropy.

[54]  T. Ferré,et al.  Analysis of subsurface temperature data to quantify groundwater recharge rates in a closed Altiplano basin, northern Chile , 2017, Hydrogeology Journal.

[55]  Chandan Singh,et al.  Disentangled Attribution Curves for Interpreting Random Forests and Boosted Trees , 2019, ArXiv.

[56]  M. Ehsani,et al.  Can Deep Learning Extract Useful Information about Energy Dissipation and Effective Hydraulic Conductivity from Gridded Conductivity Fields? , 2021, Water.

[57]  Yaolong Zhao,et al.  A spatial assessment of urban waterlogging risk based on a Weighted Naïve Bayes classifier. , 2018, The Science of the total environment.

[58]  J. Fleckenstein,et al.  A heat pulse technique for the determination of small‐scale flow directions and flow velocities in the streambed of sand‐bed streams , 2011 .

[59]  Bohyung Han,et al.  Regularizing Deep Neural Networks by Noise: Its Interpretation and Optimization , 2017, NIPS.

[60]  Alun D. Preece,et al.  Interpretability of deep learning models: A survey of results , 2017, 2017 IEEE SmartWorld, Ubiquitous Intelligence & Computing, Advanced & Trusted Computed, Scalable Computing & Communications, Cloud & Big Data Computing, Internet of People and Smart City Innovation (SmartWorld/SCALCOM/UIC/ATC/CBDCom/IOP/SCI).

[61]  A. S. Rogowski,et al.  Chemical‐hydrologic interactions in the near‐stream zone , 1988 .

[62]  Brandon M. Greenwell,et al.  Interpretable Machine Learning , 2019, Hands-On Machine Learning with R.

[63]  Marcin Korytkowski,et al.  Convolutional Neural Networks for Time Series Classification , 2017, ICAISC.

[64]  F. Triska,et al.  The role of water exchange between a stream channel and its hyporheic zone in nitrogen cycling at the terrestrial-aquatic interface , 1993, Hydrobiologia.

[65]  Glenn E. Hammond,et al.  Applying Simple Machine Learning Tools to Infer Streambed Flux from Subsurface Pressure and Temperature Observations , 2020 .

[66]  Alois Knoll,et al.  Gradient boosting machines, a tutorial , 2013, Front. Neurorobot..

[67]  Satish Karra,et al.  PFLOTRAN User Manual A Massively Parallel Reactive Flow and Transport Model for Describing Surface and Subsurface Processes , 2015 .

[68]  Haibin Chang,et al.  Machine learning subsurface flow equations from data , 2019, Computational Geosciences.

[69]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[70]  Patrick Meire,et al.  Transient or steady‐state? Using vertical temperature profiles to quantify groundwater–surface water exchange , 2009 .

[71]  Christopher S. Lowry,et al.  Locating and quantifying spatially distributed groundwater/surface water interactions using temperature signals with paired fiber‐optic cables , 2013 .

[72]  Alexander Vezhnevets,et al.  Avoiding Boosting Overfitting by Removing Confusing Samples , 2007, ECML.

[73]  Sabine Vanhuysse,et al.  Very High Resolution Object-Based Land Use–Land Cover Urban Classification Using Extreme Gradient Boosting , 2018, IEEE Geoscience and Remote Sensing Letters.

[74]  Carlos Guestrin,et al.  "Why Should I Trust You?": Explaining the Predictions of Any Classifier , 2016, ArXiv.

[75]  M. Cuthbert,et al.  Impacts of nonuniform flow on estimates of vertical streambed flux , 2013 .

[76]  John W. Labadie,et al.  Deep learning for compute-efficient modeling of BMP impacts on stream- aquifer exchange and water law compliance in an irrigated river basin , 2019, Environ. Model. Softw..

[77]  Didrik Nielsen,et al.  Tree Boosting With XGBoost - Why Does XGBoost Win "Every" Machine Learning Competition? , 2016 .

[78]  Stratocumulus Cloud Clearings: Statistics from Satellites, Reanalysis Models, and Airborne Measurements , 2020 .

[79]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[80]  Yixin Luo,et al.  Deep Learning With Noise , 2014 .

[81]  Vassilios G. Agelidis,et al.  A comparison between different windows in spectral and cross spectral analysis techniques with Kalman filtering for estimating power quality indices , 2012 .