A Unifying Framework for Probabilistic Validation Metrics

Probabilistic modeling methods are increasingly being employed in engineering applications. These approaches make inferences about the distribution for output quantities of interest. A challenge in applying probabilistic computer models (simulators) is validating output distributions against samples from observational data. An ideal validation metric is one that intuitively provides information on key differences between the simulator output and observational distributions, such as statistical distances/divergences. Within the literature, only a small set of statistical distances/divergences have been utilized for this task; often selected based on user experience and without reference to the wider variety available. As a result, this paper offers a unifying framework of statistical distances/divergences, categorizing those implemented within the literature, providing a greater understanding of their benefits, and offering new potential measures as validation metrics. In this paper, two families of measures for quantifying differences between distributions, that encompass the existing statistical distances/divergences within the literature, are analyzed: f-divergence and integral probability metrics (IPMs). Specific measures from these families are highlighted, providing an assessment of current and new validation metrics, with a discussion of their merits in determining simulator adequacy, offering validation metrics with greater sensitivity in quantifying differences across the range of probability mass.

[1]  Qing Wang,et al.  Divergence estimation of continuous distributions based on data-dependent partitions , 2005, IEEE Transactions on Information Theory.

[2]  László Györfi,et al.  A note on robust hypothesis testing , 2002, IEEE Trans. Inf. Theory.

[3]  Matthew F. Barone,et al.  Measures of agreement between computation and experiment: Validation metrics , 2004, J. Comput. Phys..

[4]  T. Rogers,et al.  Learning of model discrepancy for structural dynamics applications using Bayesian history matching , 2019, Journal of Physics: Conference Series.

[5]  Christopher J. Roy,et al.  Verification and Validation in Scientific Computing , 2010 .

[6]  P. Gardner On Novel Approaches to Model-Based StructuralHealth Monitoring , 2018 .

[7]  Sankaran Mahadevan,et al.  Assessing the Reliability of Computational Models under Uncertainty , 2013 .

[8]  R. Barthorpe,et al.  Bayesian History Matching for Forward Model-Driven Structural Health Monitoring , 2018, Model Validation and Uncertainty Quantification, Volume 3.

[9]  Z. Wang An application of Hellinger distance: Hypothesis testing for two continuous populations , 2010, 2010 IEEE International Conference on Intelligent Computing and Intelligent Systems.

[10]  E. Giné,et al.  Central limit theorems for the wasserstein distance between the empirical and the true distributions , 1999 .

[11]  Marco Cuturi,et al.  On Wasserstein Two-Sample Testing and Related Families of Nonparametric Tests , 2015, Entropy.

[12]  Martin J. Wainwright,et al.  Nonparametric estimation of the likelihood ratio and divergence functionals , 2007, 2007 IEEE International Symposium on Information Theory.

[13]  Charu C. Aggarwal,et al.  On the Surprising Behavior of Distance Metrics in High Dimensional Spaces , 2001, ICDT.

[14]  Alison L Gibbs,et al.  On Choosing and Bounding Probability Metrics , 2002, math/0209021.

[15]  Bounds for the Distance Between the Distributions of Sums of Absolutely Continuous i.i.d. Convex-Ordered Random Variables with Applications , 2009, Journal of Applied Probability.

[16]  David M. Blei,et al.  Variational Inference: A Review for Statisticians , 2016, ArXiv.

[17]  Sankaran Mahadevan,et al.  Role of calibration, validation, and relevance in multi-level uncertainty integration , 2016, Reliab. Eng. Syst. Saf..

[18]  J. Adell,et al.  Exact Kolmogorov and total variation distances between some familiar discrete distributions , 2006 .

[19]  Sivaraman Balakrishnan,et al.  Optimal kernel choice for large-scale two-sample tests , 2012, NIPS.

[20]  W. Oberkampf,et al.  Model validation and predictive capability for the thermal challenge problem , 2008 .

[21]  Le Song,et al.  A Kernel Statistical Test of Independence , 2007, NIPS.

[22]  Leandro Pardo,et al.  Hypothesis testing for two discrete populations based on the Hellinger distance , 2010 .

[23]  Biao Chen,et al.  Robust Kullback-Leibler Divergence and Universal Hypothesis Testing for Continuous Distributions , 2019, IEEE Transactions on Information Theory.

[24]  Wei Chen,et al.  Validating Dynamic Engineering Models Under Uncertainty , 2016 .

[25]  Wei Chen,et al.  New Metrics for Validation of Data-Driven Random Process Models in Uncertainty Quantification , 2016 .

[26]  F. Massey The Kolmogorov-Smirnov Test for Goodness of Fit , 1951 .

[27]  Jianhua Lin,et al.  Divergence measures based on the Shannon entropy , 1991, IEEE Trans. Inf. Theory.

[28]  Arthur Gretton,et al.  A Kernel Test of Goodness of Fit , 2016, ICML.

[29]  Wei Chen,et al.  Toward a Better Understanding of Model Validation Metrics , 2011 .

[30]  Jeremy E. Oakley,et al.  Bayesian History Matching of Complex Infectious Disease Models Using Emulation: A Tutorial and a Case Study on HIV in Uganda , 2015, PLoS Comput. Biol..

[31]  Kevin P. Murphy,et al.  Machine learning - a probabilistic perspective , 2012, Adaptive computation and machine learning series.

[32]  Bernhard Schölkopf,et al.  A Kernel Two-Sample Test , 2012, J. Mach. Learn. Res..

[33]  Frank P. Ferrie,et al.  A Note on Metric Properties for Some Divergence Measures: The Gaussian Case , 2012, ACML.

[34]  Zoubin Ghahramani,et al.  Statistical Model Criticism using Kernel Two Sample Tests , 2015, NIPS.