Bayesian assessment of the expected data impact on prediction confidence in optimal sampling design

[1] Incorporating hydro(geo)logical data, such as head and tracer data, into stochastic models of (subsurface) flow and transport helps to reduce prediction uncertainty. Because of financial limitations for investigation campaigns, information needs toward modeling or prediction goals should be satisfied efficiently and rationally. Optimal design techniques find the best one among a set of investigation strategies. They optimize the expected impact of data on prediction confidence or related objectives prior to data collection. We introduce a new optimal design method, called PreDIA(gnosis) (Preposterior Data Impact Assessor). PreDIA derives the relevant probability distributions and measures of data utility within a fully Bayesian, generalized, flexible, and accurate framework. It extends the bootstrap filter (BF) and related frameworks to optimal design by marginalizing utility measures over the yet unknown data values. PreDIA is a strictly formal information-processing scheme free of linearizations. It works with arbitrary simulation tools, provides full flexibility concerning measurement types (linear, nonlinear, direct, indirect), allows for any desired task-driven formulations, and can account for various sources of uncertainty (e.g., heterogeneity, geostatistical assumptions, boundary conditions, measurement values, model structure uncertainty, a large class of model errors) via Bayesian geostatistics and model averaging. Existing methods fail to simultaneously provide these crucial advantages, which our method buys at relatively higher-computational costs. We demonstrate the applicability and advantages of PreDIA over conventional linearized methods in a synthetic example of subsurface transport. In the example, we show that informative data is often invisible for linearized methods that confuse zero correlation with statistical independence. Hence, PreDIA will often lead to substantially better sampling designs. Finally, we extend our example to specifically highlight the consideration of conceptual model uncertainty.

[1]  P. Bickel,et al.  Obstacles to High-Dimensional Particle Filtering , 2008 .

[2]  J. Gómez-Hernández,et al.  To be or not to be multi-Gaussian? A reflection on stochastic hydrogeology , 1998 .

[3]  Milton Abramowitz,et al.  Handbook of Mathematical Functions with Formulas, Graphs, and Mathematical Tables , 1964 .

[4]  A. Sahuquillo,et al.  Stochastic simulation of transmissivity fields conditional to both transmissivity and piezometric data—I. Theory , 1997 .

[5]  Wolfgang Nowak,et al.  Uncertainty and data worth analysis for the hydraulic design of funnel‐and‐gate systems in heterogeneous aquifers , 2004 .

[6]  N. Weiss A Course in Probability , 2005 .

[7]  G. Evensen Data Assimilation: The Ensemble Kalman Filter , 2006 .

[8]  Russell C. H. Cheng,et al.  Variance reduction methods , 1986, WSC '86.

[9]  K. Chaloner,et al.  Bayesian Experimental Design: A Review , 1995 .

[10]  Yoram Rubin,et al.  The concept of comparative information yield curves and its application to risk‐based site characterization , 2009 .

[11]  D. Owen Handbook of Mathematical Functions with Formulas , 1965 .

[12]  Alberto Guadagnini,et al.  Convergence assessment of numerical Monte Carlo simulations in groundwater hydrology , 2004 .

[13]  Adrian E. Raftery,et al.  Bayesian model averaging: a tutorial (with comments by M. Clyde, David Draper and E. I. George, and a rejoinder by the authors , 1999 .

[14]  Peter Jan,et al.  Particle Filtering in Geophysical Systems , 2009 .

[15]  A. Bárdossy,et al.  Geostatistical interpolation using copulas , 2008 .

[16]  Fred C. Schweppe,et al.  Uncertain dynamic systems , 1973 .

[17]  W. Nowak,et al.  Application of FFT-based Algorithms for Large-Scale Universal Kriging Problems , 2009 .

[18]  David E. Goldberg,et al.  Designing a competent simple genetic algorithm for search and optimization , 2000 .

[19]  Yingqi Zhang,et al.  Least cost design of groundwater quality monitoring networks , 2005 .

[20]  M. Marietta,et al.  Pilot Point Methodology for Automated Calibration of an Ensemble of conditionally Simulated Transmissivity Fields: 1. Theory and Computational Experiments , 1995 .

[21]  Joel Massmann,et al.  Hydrogeological Decision Analysis: 4. The Concept of Data Worth and Its Use in the Development of Site Investigation Strategies , 1992 .

[22]  P. Kitanidis Quasi‐Linear Geostatistical Theory for Inversing , 1995 .

[23]  George F. Pinder,et al.  Space‐time optimization of groundwater quality sampling networks , 2005 .

[24]  Y. Rubin,et al.  Bayesian approach for three-dimensional aquifer characterization at the Hanford 300 Area , 2010 .

[25]  Wei-Liem Loh,et al.  Estimating structured correlation matrices in smooth Gaussian random field models , 2000 .

[26]  S. P. Neuman,et al.  Maximum likelihood Bayesian averaging of uncertain model predictions , 2003 .

[27]  Keith Beven,et al.  A Bayesian approach to stochastic capture zone delineation incorporating tracer arrival times, conductivity measurements, and hydraulic head observations , 2003 .

[28]  S. Gorelick,et al.  When enough is enough: The worth of monitoring data in aquifer remediation design , 1994 .

[29]  Goldberg,et al.  Genetic algorithms , 1993, Robust Control Systems with Genetic Algorithms.

[30]  Thomas M. Cover,et al.  Elements of Information Theory , 2005 .

[31]  Harald Kunstmann,et al.  Conditional first‐order second‐moment method and its application to the quantification of uncertainty in groundwater modeling , 2002 .

[32]  Steen Christensen A synthetic groundwater modelling study of the accuracy of GLUE uncertainty intervals , 2002 .

[33]  Keith Beven,et al.  The future of distributed models: model calibration and uncertainty prediction. , 1992 .

[34]  L. Wasserman,et al.  The Selection of Prior Distributions by Formal Rules , 1996 .

[35]  Peter K. Kitanidis,et al.  Sensitivity of temporal moments calculated by the adjoint-state method and joint inversing of head and tracer data , 2000 .

[36]  Steven E. Rigdon,et al.  Model-Oriented Design of Experiments , 1997, Technometrics.

[37]  N. Gordon,et al.  Novel approach to nonlinear/non-Gaussian Bayesian state estimation , 1993 .

[38]  N. Sun Inverse problems in groundwater modeling , 1994 .

[39]  P. Diggle,et al.  Bayesian Inference in Gaussian Model-based Geostatistics , 2002 .

[40]  Wolfgang Nowak,et al.  Efficient Computation of Linearized Cross-Covariance and Auto-Covariance Matrices of Interdependent Quantities , 2003 .

[41]  Wolfgang Nowak,et al.  On the link between contaminant source release conditions and plume prediction uncertainty. , 2010, Journal of contaminant hydrology.

[42]  T. Ulrych,et al.  Minimum relative entropy: Forward probabilistic modeling , 1993 .

[43]  N Oreskes,et al.  Verification, Validation, and Confirmation of Numerical Models in the Earth Sciences , 1994, Science.

[44]  Wolfgang Nowak,et al.  Best unbiased ensemble linearization and the quasi‐linear Kalman ensemble generator , 2009 .

[45]  M. Stein,et al.  A Bayesian analysis of kriging , 1993 .

[46]  Peter J. Diggle,et al.  Bayesian Geostatistical Design , 2006 .

[47]  Adrian E. Scheidegger,et al.  Statistical Hydrodynamics in Porous Media , 1954 .

[48]  S. P. Neuman,et al.  Bayesian analysis of data-worth considering model and parameter uncertainties , 2012 .

[49]  David E. Goldberg,et al.  Genetic Algorithms in Search Optimization and Machine Learning , 1988 .

[50]  George Christakos,et al.  Random Field Models in Earth Sciences , 1992 .

[51]  E. Jaynes Information Theory and Statistical Mechanics , 1957 .

[52]  P. Mantovan,et al.  Hydrological forecasting uncertainty assessment: Incoherence of the GLUE methodology , 2006 .

[53]  R. W. Andrews,et al.  Sensitivity Analysis for Steady State Groundwater Flow Using Adjoint Operators , 1985 .

[54]  R. Caflisch Monte Carlo and quasi-Monte Carlo methods , 1998, Acta Numerica.

[55]  Wolfgang Nowak,et al.  Measures of Parameter Uncertainty in Geostatistical Estimation and Geostatistical Optimal Design , 2010 .

[56]  Steven M. Gorelick,et al.  Framework to evaluate the worth of hydraulic conductivity data for optimal groundwater resources management in ecologically sensitive areas , 2005 .

[57]  Jun S. Liu,et al.  Monte Carlo strategies in scientific computing , 2001 .

[58]  Peter K. Kitanidis,et al.  Value of Information as a Context-Specific Measure of Uncertainty in Groundwater Remediation , 2012, Water Resources Management.

[59]  Steven F. Carle,et al.  Geologically based model of heterogeneous hydraulic conductivity in an alluvial setting , 1998 .

[60]  P. Kitanidis Parameter Uncertainty in Estimation of Spatial Functions: Bayesian Analysis , 1986 .

[61]  Y. Rubin,et al.  Bayesian geostatistical design: Task‐driven optimal site investigation when the geostatistical model is uncertain , 2010 .