Entropy-Based Experimental Design for Optimal Model Discrimination in the Geosciences

Choosing between competing models lies at the heart of scientific work, and is a frequent motivation for experimentation. Optimal experimental design (OD) methods maximize the benefit of experiments towards a specified goal. We advance and demonstrate an OD approach to maximize the information gained towards model selection. We make use of so-called model choice indicators, which are random variables with an expected value equal to Bayesian model weights. Their uncertainty can be measured with Shannon entropy. Since the experimental data are still random variables in the planning phase of an experiment, we use mutual information (the expected reduction in Shannon entropy) to quantify the information gained from a proposed experimental design. For implementation, we use the Preposterior Data Impact Assessor framework (PreDIA), because it is free of the lower-order approximations of mutual information often found in the geosciences. In comparison to other studies in statistics, our framework is not restricted to sequential design or to discrete-valued data, and it can handle measurement errors. As an application example, we optimize an experiment about the transport of contaminants in clay, featuring the problem of choosing between competing isotherms to describe sorption. We compare the results of optimizing towards maximum model discrimination with an alternative OD approach that minimizes the overall predictive uncertainty under model choice uncertainty.

[1]  David Draper,et al.  Assessment and Propagation of Model Uncertainty , 2011 .

[2]  W. Nowak,et al.  Model selection on solid ground: Rigorous comparison of nine ways to evaluate Bayesian model evidence , 2014, Water resources research.

[3]  S. Gorelick,et al.  When enough is enough: The worth of monitoring data in aquifer remediation design , 1994 .

[4]  S. P. Neuman,et al.  On model selection criteria in multimodel analysis , 2007 .

[5]  Anthony N. Pettitt,et al.  A Sequential Monte Carlo Algorithm to Incorporate Model Uncertainty in Bayesian Sequential Design , 2014 .

[6]  K. Beven Towards a coherent philosophy for modelling the environment , 2002, Proceedings of the Royal Society of London. Series A: Mathematical, Physical and Engineering Sciences.

[7]  Wolfgang Nowak,et al.  Bayesian model averaging to explore the worth of data for soil‐plant model selection and prediction , 2015 .

[8]  Keith Beven,et al.  Causal models as multiple working hypotheses about environmental processes , 2012 .

[9]  Sang Joon Kim,et al.  A Mathematical Theory of Communication , 2006 .

[10]  David Anderson,et al.  Multimodel Ranking and Inference in Ground Water Modeling , 2004, Ground water.

[11]  S. P. Neuman,et al.  Multimodel Bayesian analysis of data-worth applied to unsaturated fractured tuffs , 2012 .

[12]  S. P. Neuman,et al.  Maximum likelihood Bayesian averaging of uncertain model predictions , 2003 .

[13]  Srikanta Mishra,et al.  Model Averaging Techniques for Quantifying Conceptual Model Uncertainty , 2010, Ground water.

[14]  J. Vrugt,et al.  On the optimal design of experiments for conceptual and predictive discrimination of hydrologic system models , 2015 .

[15]  William J. Hill,et al.  Discrimination Among Mechanistic Models , 1967 .

[16]  Anthony C. Atkinson,et al.  DT-optimum designs for model discrimination and parameter estimation , 2008 .

[17]  S. M. Hassanizadeh,et al.  Insights into the Relationships Among Capillary Pressure, Saturation, Interfacial Area and Relative Permeability Using Pore-Network Modeling , 2008 .

[18]  J. Huelsenbeck,et al.  Bayesian phylogenetic model selection using reversible jump Markov chain Monte Carlo. , 2004, Molecular biology and evolution.

[19]  D. McKinney,et al.  Network design for predicting groundwater contamination , 1992 .

[20]  George F. Pinder,et al.  Space‐time optimization of groundwater quality sampling networks , 2005 .

[21]  Francesco Napolitano,et al.  Ensemble Entropy for Monitoring Network Design , 2014, Entropy.

[22]  Daniel M. Tartakovsky,et al.  Perspective on theories of non-Fickian transport in heterogeneous media , 2009 .

[23]  Frank T.-C. Tsai,et al.  Optimal observation network design for conceptual model discrimination and uncertainty reduction , 2016 .

[24]  Wolfgang Nowak,et al.  Bayesian assessment of the expected data impact on prediction confidence in optimal sampling design , 2012 .

[25]  William G. Hunter,et al.  Designs for Discriminating Between Two Rival Models , 1965 .

[26]  R. Schwarzenbach,et al.  Environmental Organic Chemistry , 1993 .

[27]  Jessica Fuerst,et al.  Diffusion In Natural Porous Media Contaminant Transport Sorption Desorption And Dissolution Kinetics , 2016 .

[28]  C. Wilke,et al.  Correlation of diffusion coefficients in dilute solutions , 1955 .

[29]  Sjoerd E. A. T. M. van der Zee,et al.  Measurement network design including traveltime determinations to minimize model prediction uncertainty , 2008 .

[30]  J. Bernardo,et al.  Bayesian Hypothesis Testing: a Reference Approach , 2002 .

[31]  G. C. Tiao,et al.  Bayesian inference in statistical analysis , 1973 .

[32]  J. Beck,et al.  Model Selection using Response Measurements: Bayesian Probabilistic Approach , 2004 .

[33]  Frank T.-C. Tsai,et al.  Hierarchical Bayesian model averaging for hydrostratigraphic modeling: Uncertainty segregation and comparative evaluation , 2013 .

[34]  Alan E. Gelfand,et al.  Bayesian statistics without tears: A sampling-resampling perspective , 1992 .

[35]  R. Schwarzenbach,et al.  Sorption of Hydrophobic Trace Organic Compounds in Groundwater Systems , 1985 .

[36]  Thomas C. Harmon,et al.  Experimental design and model parameter estimation for locating a dissolving dense nonaqueous phase liquid pool in groundwater , 2002 .

[37]  A. Atkinson,et al.  Optimal design : Experiments for discriminating between several models , 1975 .

[38]  S. F. Mousavi,et al.  An approach to the design of experiments for discriminating among alternative conceptual models , 1992 .

[39]  Peter C. Fishburn,et al.  Utility theory for decision making , 1970 .

[40]  Eric Winsberg,et al.  Simulated Experiments: Methodology for a Virtual World , 2003, Philosophy of Science.

[41]  Wolfgang Nowak,et al.  Uncertainty and data worth analysis for the hydraulic design of funnel‐and‐gate systems in heterogeneous aquifers , 2004 .

[42]  H. Moradkhani,et al.  Assessing the uncertainties of hydrologic model selection in climate change impact studies , 2011 .

[43]  B. Noetinger,et al.  Optimizing Subsurface Field Data Acquisition Using Information Theory , 2010 .

[44]  Guido Schneider,et al.  Temporal moments revisited: Why there is no better way for physically based model reduction in time , 2012 .

[45]  G. Limousin,et al.  Sorption isotherms: A review on physical bases, modeling and measurement , 2007 .

[46]  H. Akaike,et al.  Information Theory and an Extension of the Maximum Likelihood Principle , 1973 .

[47]  A. Raftery Bayesian Model Selection in Social Research , 1995 .

[48]  David Lindley,et al.  Bayesian Statistics, a Review , 1987 .

[49]  G. Schwarz Estimating the Dimension of a Model , 1978 .

[50]  Mark A. Pitt,et al.  Adaptive Design Optimization: A Mutual Information-Based Approach to Model Discrimination in Cognitive Science , 2010, Neural Computation.

[51]  S. Chib,et al.  Understanding the Metropolis-Hastings Algorithm , 1995 .

[52]  N. Lazar,et al.  Methods and Criteria for Model Selection , 2004 .

[53]  George Christakos,et al.  Random Field Models in Earth Sciences , 1992 .

[54]  Hamid Moradkhani,et al.  Toward reduction of model uncertainty: Integration of Bayesian model averaging and data assimilation , 2012 .

[55]  Keith Beven,et al.  Uniqueness of place and process representations in hydrological modelling , 2000 .

[56]  Christopher Juhlin,et al.  Baseline characterization of the CO2SINK geological storage site at Ketzin, Germany , 2006 .

[57]  Yoram Rubin,et al.  Impact of hydrogeological data on measures of uncertainty, site characterization and environmental performance metrics , 2012 .

[58]  Adrian E. Raftery,et al.  Bayesian model averaging: a tutorial (with comments by M. Clyde, David Draper and E. I. George, and a rejoinder by the authors , 1999 .

[59]  Alberto Guadagnini,et al.  Multimodel Bayesian analysis of groundwater data worth , 2014 .

[60]  Frank T.-C. Tsai,et al.  Bayesian experimental design for identification of model propositions and conceptual model uncertainty reduction , 2015 .

[61]  Hans Bock,et al.  Optimal experimental design for parameter estimation in column outflow experiments , 2002 .

[62]  W. Nowak,et al.  A statistical concept to assess the uncertainty in Bayesian model weights and its impact on model ranking , 2015 .

[63]  Clifford I. Voss,et al.  Discrimination among one‐dimensional models of solute transport in porous media: Implications for sampling design , 1988 .

[64]  Wolfgang Nowak,et al.  Predicting DNAPL mass discharge and contaminated site longevity probabilities: Conceptual model and high‐resolution stochastic simulation , 2015 .

[65]  Wolfgang Nowak,et al.  Finding the right balance between groundwater model complexity and experimental effort via Bayesian model selection , 2015 .

[66]  John A. Cherry,et al.  Dense Chlorinated Solvents and other DNAPLs in Groundwater , 1996 .

[67]  Walter Hayduk,et al.  Prediction of diffusion coefficients for nonelectrolytes in dilute aqueous solutions , 1974 .

[68]  Joachim Gross,et al.  Group Contribution Method for Viscosities Based on Entropy Scaling Using the Perturbed-Chain Polar Statistical Associating Fluid Theory , 2015 .

[69]  S. Gull Bayesian Inductive Inference and Maximum Entropy , 1988 .

[70]  Clifford I. Voss,et al.  Multiobjective sampling design for parameter estimation and model discrimination in groundwater solute transport , 1989 .

[71]  J. Timmer,et al.  Addressing parameter identifiability by model-based experimentation. , 2011, IET systems biology.

[72]  Eckhard Worch,et al.  Eine neue Gleichung zur Berechnung von Diffusionskoeffizienten gelöster Stoffe , 1993 .

[73]  Werner G. Müller,et al.  Collecting Spatial Data: Optimum Design of Experiments for Random Fields , 1998 .

[74]  George E. P. Box Choice of Response Surface Design and Alphabetic Optimality. , 1982 .

[75]  N Oreskes,et al.  Verification, Validation, and Confirmation of Numerical Models in the Earth Sciences , 1994, Science.

[76]  J. Berger Statistical Decision Theory and Bayesian Analysis , 1988 .

[77]  Jens Christian Refsgaard,et al.  Assessment of hydrological model predictive ability given multiple conceptual geological models , 2012 .

[78]  Ming Ye,et al.  A Model‐Averaging Method for Assessing Groundwater Conceptual Model Uncertainty , 2010, Ground water.

[79]  F. Helfferich,et al.  Theory of Multicomponent, Multiphase Displacement in Porous Media , 1981 .

[80]  Peter D. H. Hill,et al.  A Review of Experimental Design Procedures for Regression Model Discrimination , 1978 .

[81]  Joel Massmann,et al.  Hydrogeological Decision Analysis: 4. The Concept of Data Worth and Its Use in the Development of Site Investigation Strategies , 1992 .

[82]  Richelle M. Allen-King,et al.  Non-linear chlorinated-solvent sorption in four aquitards , 1996 .

[83]  Dmitri Kavetski,et al.  Pursuing the method of multiple working hypotheses for hydrological modeling , 2011 .

[84]  S. Feenstra,et al.  Laboratory measurements of the aqueous solubility of mixtures of chlorinated solvents , 1995 .

[85]  Michael Andrew Christie,et al.  Tenth SPE Comparative Solution Project: a comparison of upscaling techniques , 2001 .

[86]  W. Walker,et al.  Defining Uncertainty: A Conceptual Basis for Uncertainty Management in Model-Based Decision Support , 2003 .

[87]  K. Chaloner,et al.  Bayesian Experimental Design: A Review , 1995 .

[88]  David J. C. MacKay,et al.  Bayesian Interpolation , 1992, Neural Computation.

[89]  N. Sun Inverse problems in groundwater modeling , 1994 .

[90]  James L. Rosenberger,et al.  Experimental Designs for Model Discrimination , 1993 .

[91]  J. G. Brennan,et al.  Moisture sorption isotherm characteristics of potatoes at four temperatures , 1991 .

[92]  Soroosh Sorooshian,et al.  Toward improved identifiability of hydrologic model parameters: The information content of experimental data , 2002 .

[93]  Ivars Neretnieks,et al.  Diffusion in the rock matrix: An important factor in radionuclide retardation? , 1980 .

[94]  Y. Rubin,et al.  Bayesian geostatistical design: Task‐driven optimal site investigation when the geostatistical model is uncertain , 2010 .

[95]  Philip John Binning,et al.  Uncertainty evaluation of mass discharge estimates from a contaminated site using a fully Bayesian framework , 2010 .

[96]  Patrick M. Reed,et al.  Many‐objective groundwater monitoring network design using bias‐aware ensemble Kalman filtering, evolutionary optimization, and visual analytics , 2011 .

[97]  Alain Dassargues,et al.  Conceptual model uncertainty in groundwater modeling: Combining generalized likelihood uncertainty estimation and Bayesian model averaging , 2008 .

[98]  S. P. Neuman,et al.  Bayesian analysis of data-worth considering model and parameter uncertainties , 2012 .

[99]  Dennis McLaughlin,et al.  A stochastic approach to model validation , 1992 .

[100]  T. Asano,et al.  ENTROPY , RELATIVE ENTROPY , AND MUTUAL INFORMATION , 2008 .

[101]  J. Cherry,et al.  Field study of TCE diffusion profiles below DNAPL to assess aquitard integrity. , 2004, Journal of contaminant hydrology.

[102]  Adrian E. Raftery,et al.  Bayesian Model Averaging: A Tutorial , 2016 .

[103]  Kevin P. Murphy,et al.  Machine learning - a probabilistic perspective , 2012, Adaptive computation and machine learning series.