Validation of Bayesian posterior distributions using a multidimensional Kolmogorov-Smirnov test

We extend the Kolmogorov‐Smirnov (K-S) test to multiple dimensions by suggesting a R n ! [0;1] mapping based on the probability content of the highest probability density region of the reference distribution under consideration; this mapping reduces the problem back to the one-dimensional case to which the standard K-S test may be applied. The universal character of this mapping also allows us to introduce a simple, yet general, method for the validation of Bayesian posterior distributions of any dimensionality. This new approach goes beyond validating software implementations; it provides a sensitive test for all assumptions, explicit or implicit, that underlie the inference. In particular, the method assesses whether the inferred posterior distribution is a truthful representation of the actual constraints on the model parameters. We illustrate our multidimensional K-S test by applying it to a simple twodimensional Gaussian toy problem, and demonstrate our method for posterior validation in the real-world astrophysical application of estimating the physical parameters of galaxy clusters parameters from their Sunyaev‐Zel’dovich effect in microwave background data. In the latter example, we show that the method can validate the entire Bayesian inference process across a varied population of objects for which the derived posteriors are different in each case.

[1]  William H. Press,et al.  Numerical Recipes 3rd Edition: The Art of Scientific Computing , 2007 .

[2]  N. Kuiper Tests concerning random points on a circle , 1960 .

[3]  Leslie M. Collins,et al.  An improved Bayesian decision theoretic approach for land mine detection , 1999, IEEE Trans. Geosci. Remote. Sens..

[4]  George E. P. Box,et al.  Bayesian Inference in Statistical Analysis: Box/Bayesian , 1992 .

[5]  David N. Spergel,et al.  The Atacama Cosmology Telescope: Sunyaev-Zel'dovich selected galaxy clusters at 148 GHz from three seasons of data , 2013, 1301.0816.

[6]  Rajesh P. N. Rao,et al.  Bayesian brain : probabilistic approaches to neural coding , 2006 .

[7]  John Skilling,et al.  Data analysis : a Bayesian tutorial , 1996 .

[8]  G. C. Tiao,et al.  Bayesian inference in statistical analysis , 1973 .

[9]  R. Baierlein Probability Theory: The Logic of Science , 2004 .

[10]  Michael S. Warren,et al.  Toward a Halo Mass Function for Precision Cosmology: The Limits of Universality , 2008, 0803.2706.

[11]  A. Baddeley,et al.  A non-parametric measure of spatial interaction in point patterns , 1996, Advances in Applied Probability.

[12]  Steven C Le Comber,et al.  From Jack the Ripper to epidemiology and ecology. , 2012, Trends in ecology & evolution.

[13]  J. E. Glynn,et al.  Numerical Recipes: The Art of Scientific Computing , 1989 .

[14]  R. Miquel,et al.  Cosmological Model Selection:. Statistics and Physics , 2008 .

[15]  M. Hobson,et al.  Powellsnakes II: a fast Bayesian approach to discrete object detection in multi-frequency astronomical data sets , 2011, 1112.4886.

[16]  M. Lueker,et al.  GALAXY CLUSTERS DISCOVERED VIA THE SUNYAEV–ZEL’DOVICH EFFECT IN THE FIRST 720 SQUARE DEGREES OF THE SOUTH POLE TELESCOPE SURVEY , 2012, 1203.5775.

[17]  E. Lehmann Testing Statistical Hypotheses , 1960 .

[18]  H. Lilliefors On the Kolmogorov-Smirnov Test for Normality with Mean and Variance Unknown , 1967 .

[19]  C. A. Oxborrow,et al.  Planck 2013 results. I. Overview of products and scientific results , 2013, 1502.01582.

[20]  F. Feroz,et al.  Multimodal nested sampling: an efficient and robust alternative to Markov Chain Monte Carlo methods for astronomical data analyses , 2007, 0704.3704.

[21]  Donald B. Rubin,et al.  Validation of Software for Bayesian Models Using Posterior Quantiles , 2006 .

[22]  Elena Pierpaoli,et al.  SUNYAEV–ZEL'DOVICH-MEASURED PRESSURE PROFILES FROM THE BOLOCAM X-RAY/SZ GALAXY CLUSTER SAMPLE , 2012, 1211.1632.

[23]  Harvard,et al.  Effects of Galaxy Formation on Thermodynamics of the Intracluster Medium , 2007, astro-ph/0703661.

[24]  Christopher M. Bishop,et al.  Pattern Recognition and Machine Learning (Information Science and Statistics) , 2006 .

[25]  J. Bernardo,et al.  Intrinsic credible regions: An objective Bayesian approach to interval estimation , 2005 .

[26]  G. Efstathiou,et al.  Limitations of Bayesian Evidence Applied to Cosmology , 2008, 0802.3185.

[27]  B. M. Hill,et al.  Bayesian Inference in Statistical Analysis , 1974 .

[28]  P. Druilhet,et al.  Invariant HPD credible sets and MAP estimators , 2007 .

[29]  Nasser M. Nasrabadi,et al.  Pattern Recognition and Machine Learning , 2006, Technometrics.

[30]  M. Rosenblatt Remarks on a Multivariate Transformation , 1952 .

[31]  G. Fasano,et al.  A multidimensional version of the Kolmogorov–Smirnov test , 1987 .

[32]  R. B. Barreiro,et al.  Planck intermediate results. II. Comparison of Sunyaev-Zeldovich measurements from Planck and from the Arcminute Microkelvin Imager for 11 galaxy clusters , 2012, 1204.1318.

[33]  C. A. Oxborrow,et al.  Planck 2013 results. XVI. Cosmological parameters , 2013, 1303.5076.

[34]  David J. C. MacKay,et al.  Information Theory, Inference, and Learning Algorithms , 2004, IEEE Transactions on Information Theory.

[35]  P. Ho Geoscience And Remote Sensing , 2014 .

[36]  Susan T. Dumais,et al.  A Bayesian Approach to Filtering Junk E-Mail , 1998, AAAI 1998.

[37]  Edward J. Wollack,et al.  NINE-YEAR WILKINSON MICROWAVE ANISOTROPY PROBE (WMAP) OBSERVATIONS: COSMOLOGICAL PARAMETER RESULTS , 2012, 1212.5226.

[38]  S. Dorn,et al.  Diagnostics for insufficiencies of posterior calculations in Bayesian signal inference. , 2013, Physical review. E, Statistical, nonlinear, and soft matter physics.

[39]  G. W. Pratt,et al.  Planck intermediate results: V. Pressure profiles of galaxy clusters from the Sunyaev-Zeldovich effect , 2012, 1207.4061.

[40]  Radford M. Neal Pattern Recognition and Machine Learning , 2007, Technometrics.

[41]  J. Peacock Two-dimensional goodness-of-fit testing in astronomy , 1983 .

[42]  R. Zamar,et al.  A multivariate Kolmogorov-Smirnov test of goodness of fit , 1997 .

[43]  E. Jaynes Probability theory : the logic of science , 2003 .

[44]  John Skilling,et al.  Data Analysis-A Bayesian Tutorial: Second Edition , 2006 .

[45]  T. W. Anderson,et al.  Asymptotic Theory of Certain "Goodness of Fit" Criteria Based on Stochastic Processes , 1952 .

[46]  G. W. Pratt,et al.  The universal galaxy cluster pressure profile from a representative sample of nearby systems (REXCESS) and the Y-SZ-M-500 relation , 2009, 0910.1234.

[47]  Dongchu Sun,et al.  Reference priors with partial information , 1998 .