Correcting for ascertainment bias in the inference of population structure

BACKGROUND The ascertainment process of molecular markers amounts to disregard loci carrying alleles with low frequencies. This can result in strong biases in inferences under population genetics models if not properly taken into account by the inference algorithm. Attempting to model this censoring process in view of making inference of population structure (i.e.identifying clusters of individuals) brings up challenging numerical difficulties. METHOD These difficulties are related to the presence of intractable normalizing constants in Metropolis-Hastings acceptance ratios. This can be solved via an Markov chain Monte Carlo (MCMC) algorithm known as single variable exchange algorithm (SVEA). RESULT We show how this general solution can be implemented for a class of clustering models of broad interest in population genetics that includes the models underlying the computer programs STRUCTURE, GENELAND and GESTE. We also implement the method proposed for a simple example and show that it allows us to reduce the bias substantially. AVAILABILITY Further details and a computer program implementing the method are available from http://folk.uio.no/gillesg/AscB/.

[1]  Oscar Gaggiotti,et al.  Identifying the Environmental Factors That Determine the Genetic Structure of Populations , 2006, Genetics.

[2]  Arnaud Estoup,et al.  A Spatial Statistical Model for Landscape Genetics , 2005, Genetics.

[3]  Gilles Guillot,et al.  Inference of structure in subdivided populations at low levels of genetic differentiation - the correlated allele frequencies model revisited , 2008, Bioinform..

[4]  R. Nielsen,et al.  Correcting for ascertainment biases when analyzing SNP data: applications to the estimation of linkage disequilibrium. , 2003, Theoretical population biology.

[5]  Zoubin Ghahramani,et al.  MCMC for Doubly-intractable Distributions , 2006, UAI.

[6]  M. Stephens,et al.  Inference of population structure using multilocus genotype data: linked loci and correlated allele frequencies. , 2003, Genetics.

[7]  Carlos D Bustamante,et al.  Ascertainment bias in studies of human genome-wide polymorphism. , 2005, Genome research.

[8]  Mark A Beaumont,et al.  An Approximate Bayesian Computation Approach to Overcome Biases That Arise When Using Amplified Fragment Length Polymorphism Markers to Study Population Structure , 2008, Genetics.

[9]  J. Møller,et al.  An efficient Markov chain Monte Carlo method for distributions with intractable normalising constants , 2006 .

[10]  S. Liu-Cordero,et al.  The discovery of single-nucleotide polymorphisms--and inferences about human demographic history. , 2001, American journal of human genetics.

[11]  John Novembre,et al.  Ascertainment bias in spatially structured populations: a case study in the eastern fence lizard. , 2007, The Journal of heredity.

[12]  S. Wright,et al.  Isolation by Distance. , 1943, Genetics.

[13]  Peter Donnelly,et al.  Assessing population differentiation and isolation from single‐nucleotide polymorphism data , 2002 .

[14]  Christian P. Robert,et al.  Bayesian computation for statistical models with intractable normalizing constants , 2008, 0804.3152.

[15]  Dipak K Dey,et al.  A Bayesian approach to inferring population structure from dominant markers , 2002, Molecular ecology.

[16]  C. Andrieu,et al.  The pseudo-marginal approach for efficient Monte Carlo computations , 2009, 0903.5480.

[17]  M. Beaumont Estimation of population growth or decline in genetically monitored populations. , 2003, Genetics.

[18]  D. Balding Likelihood-based inference for genetic correlation coefficients. , 2003, Theoretical population biology.

[19]  D. Balding,et al.  A method for quantifying differentiation between populations at multi-allelic loci and its implications for investigating identity and paternity , 2005, Genetica.