Amount of Information Needed for Model Choice in Approximate Bayesian Computation

Approximate Bayesian Computation (ABC) has become a popular technique in evolutionary genetics for elucidating population structure and history due to its flexibility. The statistical inference framework has benefited from significant progress in recent years. In population genetics, however, its outcome depends heavily on the amount of information in the dataset, whether that be the level of genetic variation or the number of samples and loci. Here we look at the power to reject a simple constant population size coalescent model in favor of a bottleneck model in datasets of varying quality. Not only is this power dependent on the number of samples and loci, but it also depends strongly on the level of nucleotide diversity in the observed dataset. Whilst overall model choice in an ABC setting is fairly powerful and quite conservative with regard to false positives, detecting weaker bottlenecks is problematic in smaller or less genetically diverse datasets and limits the inferences possible in non-model organism where the amount of information regarding the two models is often limited. Our results show it is important to consider these limitations when performing an ABC analysis and that studies should perform simulations based on the size and nature of the dataset in order to fully assess the power of the study.

[1]  Christian P Robert,et al.  Lack of confidence in approximate Bayesian computation model choice , 2011, Proceedings of the National Academy of Sciences.

[2]  J.-M. Marin,et al.  Relevant statistics for Bayesian model choice , 2011, 1110.4700.

[3]  Justin C. Fay,et al.  Hitchhiking under positive Darwinian selection. , 2000, Genetics.

[4]  F. Tajima Statistical method for testing the neutral mutation hypothesis by DNA polymorphism. , 1989, Genetics.

[5]  J. Møller Discussion on the paper by Feranhead and Prangle , 2012 .

[6]  Shinn-Ying Ho,et al.  Predicting and analyzing DNA-binding domains using a systematic approach to identifying a set of informative physicochemical and biochemical properties , 2011, BMC Bioinformatics.

[7]  Kevin R. Thornton,et al.  Approximate Bayesian Inference Reveals Evidence for a Recent, Severe Bottleneck in a Netherlands Population of Drosophila melanogaster , 2006, Genetics.

[8]  Christian P Robert,et al.  Molecular Ecology Ressources – subject area: Methodological Advances 1 2 Estimation of demo-genetic model probabilities with Approximate Bayesian 3 Computation using linear discriminant analysis on summary statistics , 2012 .

[9]  A. Futschik,et al.  A Novel Approach for Choosing Summary Statistics in Approximate Bayesian Computation , 2012, Genetics.

[10]  L. Excoffier,et al.  Efficient Approximate Bayesian Computation Coupled With Markov Chain Monte Carlo Without Likelihood , 2009, Genetics.

[11]  Mattias Jakobsson,et al.  Estimating demographic parameters from large-scale population genomic data using Approximate Bayesian Computation , 2012, BMC Genetics.

[12]  M. Siol,et al.  EggLib: processing, analysis and simulation tools for population genetics and genomics , 2012, BMC Genetics.

[13]  Paul Fearnhead,et al.  Constructing summary statistics for approximate Bayesian computation: semi‐automatic approximate Bayesian computation , 2012 .

[14]  J. Wakeley Coalescent Theory: An Introduction , 2008 .

[15]  Ryan D. Hernandez,et al.  Genome-Wide Patterns of Nucleotide Polymorphism in Domesticated Rice , 2007, PLoS genetics.

[16]  M. Morgante,et al.  Multilocus Patterns of Nucleotide Diversity, Linkage Disequilibrium and Demographic History of Norway Spruce [Picea abies (L.) Karst] , 2006, Genetics.

[17]  S. Wright,et al.  The population genomics of plant adaptation. , 2010, The New phytologist.

[18]  B. Goossens,et al.  The Confounding Effects of Population Structure, Genetic Diversity and the Sampling Scheme on the Detection and Quantification of Population Size Changes , 2010, Genetics.

[19]  Bart Funnekotter Oxford Oxford University , 2005 .

[20]  M. Lascoux,et al.  Origin and demographic history of the endemic Taiwan spruce (Picea morrisonicola) , 2013, Ecology and evolution.

[21]  Paul Fearnhead,et al.  Semi-automatic selection of summary statistics for ABC model choice , 2013, Statistical applications in genetics and molecular biology.

[22]  Lucie M. Gattepaille,et al.  Inferring population size changes with sequence and SNP data: lessons from human bottlenecks , 2013, Heredity.

[23]  E. Hadly,et al.  Bayesian Estimation of the Timing and Severity of a Population Bottleneck from Ancient DNA , 2006, PLoS genetics.

[24]  Y. Vigouroux,et al.  Evolutionary history of pearl millet (Pennisetum glaucum [L.] R. Br.) and selection on flowering genes since its domestication. , 2012, Molecular biology and evolution.

[25]  R. Plevin,et al.  Approximate Bayesian Computation in Evolution and Ecology , 2011 .

[26]  Paul Marjoram,et al.  Statistical Applications in Genetics and Molecular Biology Approximately Sufficient Statistics and Bayesian Computation , 2011 .

[27]  F. Tajima Evolutionary relationship of DNA sequences in finite populations. , 1983, Genetics.

[28]  M. Nei Molecular Evolutionary Genetics , 1987 .

[29]  Tom Burr,et al.  Selecting Summary Statistics in Approximate Bayesian Computation for Calibrating Stochastic Models , 2013, BioMed research international.

[30]  S. Sisson,et al.  A comparative review of dimension reduction methods in approximate Bayesian computation , 2012, 1202.3819.

[31]  D. Balding,et al.  Approximate Bayesian computation in population genetics. , 2002, Genetics.

[32]  L. M. M.-T. Theory of Probability , 1929, Nature.

[33]  S. Wright,et al.  Mating-System Variation, Demographic History and Patterns of Nucleotide Diversity in the Tristylous Plant Eichhornia paniculata , 2010, Genetics.

[34]  Nicolas Ray,et al.  Bayesian Estimation of Recent Migration Rates After a Spatial Expansion , 2005, Genetics.

[35]  Wen Huang,et al.  MTML-msBayes: Approximate Bayesian comparative phylogeographic inference from multiple taxa and multiple loci with rate heterogeneity , 2011, BMC Bioinformatics.

[36]  G. A. Watterson On the number of segregating sites in genetical models without recombination. , 1975, Theoretical population biology.

[37]  D. Balding,et al.  Statistical Applications in Genetics and Molecular Biology On Optimal Selection of Summary Statistics for Approximate Bayesian Computation , 2011 .

[38]  C. Moritz,et al.  Comparative phylogeographic summary statistics for testing simultaneous vicariance , 2005, Molecular ecology.

[39]  M. Jakobsson,et al.  Joint analysis of demography and selection in population genetics: where do we stand and where could we go? , 2012, Molecular ecology.

[40]  Laurent Excoffier,et al.  Distinguishing between population bottleneck and population subdivision by a Bayesian model choice procedure , 2010, Molecular ecology.

[41]  F. Tajima The effect of change in population size on DNA polymorphism. , 1989, Genetics.