Relevant statistics for Bayesian model choice

The choice of the summary statistics used in Bayesian inference and in particular in ABC algorithms has bearings on the validation of the resulting inference. Those statistics are nonetheless customarily used in ABC algorithms without consistency checks. We derive necessary and sufficient conditions on summary statistics for the corresponding Bayes factor to be convergent, namely to asymptotically select the true model. Those conditions, which amount to the expectations of the summary statistics to asymptotically differ under both models, are quite natural and can be exploited in ABC settings to infer whether or not a choice of summary statistics is appropriate, via a Monte Carlo validation.

[1]  P. Donnelly,et al.  Inferring coalescence times from DNA sequence data. , 1997, Genetics.

[2]  Judith Rousseau,et al.  Approximating Interval hypothesis : p-values and Bayes factors , 2007 .

[3]  Christophe Andrieu,et al.  Reply to Robert et al.: Model criticism informs model choice and model comparison , 2009, Proceedings of the National Academy of Sciences.

[4]  Richard G. Everitt,et al.  Likelihood-free estimation of model evidence , 2011 .

[5]  Bradley P. Carlin,et al.  Bayesian measures of model complexity and fit , 2002 .

[6]  R. Rao,et al.  Normal Approximation and Asymptotic Expansions , 1976 .

[7]  M Slatkin,et al.  A measure of population subdivision based on microsatellite allele frequencies. , 1995, Genetics.

[8]  J. Møller Discussion on the paper by Feranhead and Prangle , 2012 .

[9]  M. Kendall Theoretical Statistics , 1956, Nature.

[10]  Christian P Robert,et al.  Lack of confidence in approximate Bayesian computation model choice , 2011, Proceedings of the National Academy of Sciences.

[11]  P. Diggle,et al.  Monte Carlo Methods of Inference for Implicit Statistical Models , 1984 .

[12]  Christophe Andrieu,et al.  Model criticism based on likelihood-free inference, with an application to protein network evolution , 2009, Proceedings of the National Academy of Sciences.

[13]  David Allingham,et al.  Bayesian estimation of quantile distributions , 2009, Stat. Comput..

[14]  Erdogan Gunel,et al.  Bayes Factors from Mixed Probabilities , 1978 .

[15]  M W Feldman,et al.  An evaluation of genetic distances for use with microsatellite loci. , 1994, Genetics.

[16]  A. V. D. Vaart,et al.  Convergence rates of posterior distributions , 2000 .

[17]  Paul Fearnhead,et al.  Semi-automatic Approximate Bayesian Computation , 2010 .

[18]  David Welch,et al.  Approximate Bayesian computation scheme for parameter inference and model selection in dynamical systems , 2009, Journal of The Royal Society Interface.

[19]  Jean-Michel Marin,et al.  Approximate Bayesian computational methods , 2011, Statistics and Computing.

[20]  C. Robert,et al.  ABC likelihood-free methods for model choice in Gibbs random fields , 2008, 0807.2767.

[21]  Jean-Marie Cornuet,et al.  Inferring population history with DIY ABC: a user-friendly approach to approximate Bayesian computation , 2008, Bioinform..

[22]  Adrian E. Raftery,et al.  Classification of Mixtures of Spatial Point Processes via Partial Bayes Factors , 2005 .

[23]  Richard Bellamy,et al.  An Empirical Exploration of the (Δμ)2 Genetic Distance for 213 Human Microsatellite Markers , 1999 .

[24]  K. Mengersen,et al.  Robustness of ranking and selection rules using generalised g-and-k distributions , 1997 .

[25]  M. Feldman,et al.  Population growth of human Y chromosomes: a study of Y chromosome microsatellites. , 1999, Molecular biology and evolution.

[26]  K. Mengersen,et al.  Asymptotic behaviour of the posterior distribution in overfitted mixture models , 2011 .

[27]  Paul Fearnhead,et al.  Constructing summary statistics for approximate Bayesian computation: semi‐automatic approximate Bayesian computation , 2012 .