Inference of structure in subdivided populations at low levels of genetic differentiation - the correlated allele frequencies model revisited

MOTIVATION This article considers the problem of estimating population genetic subdivision from multilocus genotype data. A model is considered to make use of genotypes and possibly of spatial coordinates of sampled individuals. A particular attention is paid to the case of low genetic differentiation with the help of a previously described Bayesian clustering model where allele frequencies are assumed to be a priori correlated. Under this model, various problems of inference are considered, in particular the common and difficult, but still unaddressed, situation where the number of populations is unknown. RESULTS A Markov chain Monte Carlo algorithm and a new post-processing scheme are proposed. It is shown that they significantly improve the accuracy of previously existing algorithms in terms of estimated number of populations and estimated population membership. This is illustrated numerically with data simulated from the prior-likelihood model used in inference and also with data simulated from a Wright-Fisher model. Improvements are also illustrated on a real dataset of eighty-eight wolverines (Gulo gulo) genotyped at 10 microsatellites loci. The interest of the solutions presented here are not specific to any clustering model and are hence relevant to many settings in populations genetics where weakly differentiated populations are assumed or sought. AVAILABILITY The improvements implemented will be made available in version 3.0.0 of the R package Geneland. Informations on how to get and use the software are available from http://folk.uio.no/gillesg/Geneland.html. SUPPLEMENTARY INFORMATION http://folk.uio.no/gillesg/CFM/SuppMat.pdf.

[1]  Oscar Gaggiotti,et al.  Identifying the Environmental Factors That Determine the Genetic Structure of Populations , 2006, Genetics.

[2]  Motoo Kimura,et al.  Some Genetic Problems in Natural Populations , 1956 .

[3]  M. Stephens Dealing with label switching in mixture models , 2000 .

[4]  B Rannala,et al.  Estimating gene flow in island populations. , 1996, Genetical research.

[5]  A Coulon,et al.  Congruent population structure inferred from dispersal behaviour and intensive genetic surveys of the threatened Florida scrub‐jay (Aphelocoma cœrulescens) , 2008, Molecular ecology.

[6]  P. Donnelly,et al.  Inference of population structure using multilocus genotype data. , 2000, Genetics.

[7]  Noah A. Rosenberg,et al.  CLUMPP: a cluster matching and permutation program for dealing with label switching and multimodality in analysis of population structure , 2007, Bioinform..

[8]  R. Nielsen,et al.  Maximum likelihood estimation of population divergence times and population phylogenies under the infinite sites model. , 1998, Theoretical population biology.

[9]  S. Funk,et al.  Ecological factors influence population genetic structure of European grey wolves , 2006, Molecular ecology.

[10]  M W Bruford,et al.  Inbreeding of bottlenecked butterfly populations. Estimation using the likelihood of changes in marker allele frequencies. , 1999, Genetics.

[11]  P. Donnelly,et al.  The effects of human population structure on large genetic association studies , 2004, Nature Genetics.

[12]  R. Mac Nally,et al.  Distinguishing past from present gene flow along and across a river: the case of the carnivorous marsupial (Antechinus flavipes) on southern Australian floodplains , 2008, Conservation Genetics.

[13]  D. Balding,et al.  Significant genetic correlations among Caucasians at forensic DNA loci , 1997, Heredity.

[14]  M. Stephens,et al.  Inference of population structure using multilocus genotype data: dominant markers and null alleles , 2007, Molecular ecology notes.

[15]  G Rowe,et al.  Defining population boundaries: use of three Bayesian approaches with microsatellite data from British natterjack toads (Bufo calamita) , 2007, Molecular ecology.

[16]  C. Holmes,et al.  MCMC and the Label Switching Problem in Bayesian Mixture Modelling 1 Markov Chain Monte Carlo Methods and the Label Switching Problem in Bayesian Mixture Modelling , 2004 .

[17]  M. Sillanpää,et al.  Bayesian analysis of genetic differentiation between populations. , 2003, Genetics.

[18]  Carlos D Bustamante,et al.  A Markov Chain Monte Carlo Approach for Joint Inference of Population Structure and Inbreeding Rates From Multilocus Genotype Data , 2007, Genetics.

[19]  Rannala,et al.  The Sampling Theory of Neutral Alleles in an Island Population of Fluctuating Size , 1996, Theoretical population biology.

[20]  B. Rannala,et al.  The Bayesian revolution in genetics , 2004, Nature Reviews Genetics.

[21]  M. Stephens,et al.  Inference of population structure using multilocus genotype data: linked loci and correlated allele frequencies. , 2003, Genetics.

[22]  Joseph A. Tworek,et al.  A Genetic Mixture Analysis for use with Incomplete Source Population Data , 1990 .

[23]  J. Pritchard,et al.  Documentation for structure software : Version 2 . 3 , 2009 .

[24]  M. Moser,et al.  Consensual immunity: success-driven development of T-helper-1 and T-helper-2 responses , 2005, Nature Reviews Immunology.

[25]  J. Møller,et al.  An efficient Markov chain Monte Carlo method for distributions with intractable normalising constants , 2006 .

[26]  Neil J. Anderson,et al.  Assessing population structure and gene flow in Montana wolverines (Gulo gulo) using assignment‐based approaches , 2003, Molecular ecology.

[27]  E. M. Crowley Product Partition Models for Normal Means , 1997 .

[28]  Christian P. Robert,et al.  Monte Carlo Statistical Methods , 2005, Springer Texts in Statistics.

[29]  K J Dawson,et al.  A Bayesian approach to the identification of panmictic populations and the assignment of individuals. , 2001, Genetical research.

[30]  P. Green,et al.  On Bayesian Analysis of Mixtures with an Unknown Number of Components (with discussion) , 1997 .

[31]  Mark A. Beaumont,et al.  Microsatellite analysis of genetic diversity in fragmented South African buffalo populations , 1998 .

[32]  J. Pitman Some developments of the Blackwell-MacQueen urn scheme , 1996 .

[33]  Jean-Michel Marin,et al.  Bayesian Core: A Practical Approach to Computational Bayesian Statistics , 2010 .

[34]  Jukka Corander,et al.  Bayesian spatial modeling of genetic population structure , 2008, Comput. Stat..

[35]  C. Antoniak Mixtures of Dirichlet Processes with Applications to Bayesian Nonparametric Problems , 1974 .

[36]  Jesper Møller,et al.  Spatial statistics and computational methods , 2003 .

[37]  Ajay Jasra,et al.  Markov Chain Monte Carlo Methods and the Label Switching Problem in Bayesian Mixture Modeling , 2005 .

[38]  S. Wright,et al.  Evolution in Mendelian Populations. , 1931, Genetics.

[39]  Terence P. Speed,et al.  Discussion on the meeting on ‘Statistical modelling and analysis of genetic data’ , 2002 .

[40]  D. Balding,et al.  A method for quantifying differentiation between populations at multi-allelic loci and its implications for investigating identity and paternity , 2005, Genetica.

[41]  Arnaud Estoup,et al.  A Spatial Statistical Model for Landscape Genetics , 2005, Genetics.

[42]  L. Excoffier,et al.  Computer programs for population genetics data analysis: a survival guide , 2006, Nature Reviews Genetics.

[43]  Christian Lantuéjoul,et al.  Geostatistical Simulation: Models and Algorithms , 2001 .

[44]  Richard R. Hudson,et al.  Generating samples under a Wright-Fisher neutral model of genetic variation , 2002, Bioinform..

[45]  Peter Donnelly,et al.  Assessing population differentiation and isolation from single‐nucleotide polymorphism data , 2002 .

[46]  Lancelot F. James,et al.  Generalized weighted Chinese restaurant processes for species sampling mixture models , 2003 .

[47]  J. Pella,et al.  The Gibbs and splitmerge sampler for population mixture analysis from genetic data with incomplete baselines , 2006 .

[48]  Nicolas Ray,et al.  Rise of oceanographic barriers in continuous populations of a cetacean: the genetic structure of harbour porpoises in Old World waters , 2007, BMC Biology.

[49]  J. Huelsenbeck,et al.  Inference of Population Structure Under a Dirichlet Process Model , 2007, Genetics.

[50]  A Coulon,et al.  Genetic structure is influenced by landscape features: empirical evidence from a roe deer population , 2006, Molecular ecology.

[51]  Gilles Guillot,et al.  Population substructure in Finland and Sweden revealed by the use of spatial coordinates and a small number of unlinked autosomal SNPs , 2008, BMC Genetics.

[52]  D. Balding A tutorial on statistical methods for population association studies , 2006, Nature Reviews Genetics.

[53]  Arnaud Estoup,et al.  Analysing georeferenced population genetics data with Geneland: a new algorithm to deal with null alleles and a friendly graphical user interface , 2008, Bioinform..

[54]  M. Feldman,et al.  Genetic Structure of Human Populations , 2002, Science.

[55]  C. Robert,et al.  Computational and Inferential Difficulties with Mixture Posterior Distributions , 2000 .

[56]  Mark A Beaumont,et al.  An Approximate Bayesian Computation Approach to Overcome Biases That Arise When Using Amplified Fragment Length Polymorphism Markers to Study Population Structure , 2008, Genetics.

[57]  P. Green,et al.  Hidden Markov Models and Disease Mapping , 2002 .

[58]  O. Gaggiotti,et al.  INVITED REVIEW: What is a population? An empirical evaluation of some genetic methods for identifying the number of gene pools and their degree of connectivity , 2006, Molecular ecology.

[59]  A. U.S. Measuring heterogeneity in forensic databases using hierarchical Bayes models , 2005 .

[60]  Sophie Ancelet,et al.  Bayesian Clustering Using Hidden Markov Random Fields in Spatial Population Genetics , 2006, Genetics.

[61]  Jean-Michel Marin,et al.  Bayesian Modelling and Inference on Mixtures of Distributions , 2005 .

[62]  Ian W. Evett,et al.  Bayesian Analysis of DNA Profiling Data in Forensic Identification Applications , 1997 .