Bayesian spatial modeling of genetic population structure

Natural populations of living organisms often have complex histories consisting of phases of expansion and decline, and the migratory patterns within them may fluctuate over space and time. When parts of a population become relatively isolated, e.g., due to geographical barriers, stochastic forces reshape certain DNA characteristics of the individuals over generations such that they reflect the restricted migration and mating/reproduction patterns. Such populations are typically termed as genetically structured and they may be statistically represented in terms of several clusters between which DNA variations differ clearly from each other. When detailed knowledge of the ancestry of a natural population is lacking, the DNA characteristics of a sample of current generation individuals often provide a wealth of information in this respect. Several statistical approaches to model-based clustering of such data have been introduced, and in particular, the Bayesian approach to modeling the genetic structure of a population has attained a vivid interest among biologists. However, the possibility of utilizing spatial information from sampled individuals in the inference about genetic clusters has been incorporated into such analyses only very recently. While the standard Bayesian hierarchical modeling techniques through Markov chain Monte Carlo simulation provide flexible means for describing even subtle patterns in data, they may also result in computationally challenging procedures in practical data analysis. Here we develop a method for modeling the spatial genetic structure using a combination of analytical and stochastic methods. We achieve this by extending a novel theory of Bayesian predictive classification with the spatial information available, described here in terms of a colored Voronoi tessellation over the sample domain. Our results for real and simulated data sets illustrate well the benefits of incorporating spatial information to such an analysis.

[1]  J. Heikkinen,et al.  Non‐parametric Bayesian Estimation of a Spatial Poisson Intensity , 1998 .

[2]  Mats Gyllenberg,et al.  Bayesian unsupervised classification framework based on stochastic partitions of data and a parallel search strategy , 2009, Adv. Data Anal. Classif..

[3]  Pekka Marttinen,et al.  A Bayesian method for identification of stock mixtures from molecular marker data , 2006 .

[4]  M. Stephens,et al.  Inference of population structure using multilocus genotype data: dominant markers and null alleles , 2007, Molecular ecology notes.

[5]  D G Denison,et al.  Bayesian Partitioning for Estimating Disease Risk , 2001, Biometrics.

[6]  Wilfred Perks,et al.  Some observations on inverse probability including a new indifference rule , 1947 .

[7]  J. Corander,et al.  COEXISTENCE OF THE SOCIAL TYPES: GENETIC POPULATION STRUCTURE IN THE: ANT FORMICA EXSECTA , 2004, Evolution; international journal of organic evolution.

[8]  B. Rannala,et al.  Detecting immigration by using multilocus genotypes. , 1997, Proceedings of the National Academy of Sciences of the United States of America.

[9]  David G. Stork,et al.  Pattern Classification (2nd ed.) , 1999 .

[10]  S WRIGHT,et al.  Genetical structure of populations. , 1950, Nature.

[11]  S. Wright,et al.  Isolation by Distance. , 1943, Genetics.

[12]  A. Gelfand,et al.  Proper multivariate conditional autoregressive models for spatial data analysis. , 2003, Biostatistics.

[13]  Arnaud Estoup,et al.  A Spatial Statistical Model for Landscape Genetics , 2005, Genetics.

[14]  S. Wright,et al.  An Analysis of Local Variability of Flower Color in Linanthus Parryae. , 1943, Genetics.

[15]  Anne Berry,et al.  A wide-range efficient algorithm for minimal triangulation , 1999, SODA '99.

[16]  J. Heikkinen,et al.  Modeling a Poisson forest in variable elevations: a nonparametric Bayesian approach. , 1999, Biometrics.

[17]  D. Balding,et al.  Significant genetic correlations among Caucasians at forensic DNA loci , 1997, Heredity.

[18]  D. Hartl,et al.  Principles of population genetics , 1981 .

[19]  M. Stephens,et al.  Inference of population structure using multilocus genotype data: linked loci and correlated allele frequencies. , 2003, Genetics.

[20]  Stanley Sawyer,et al.  Asymptotic properties of the equilibrium probability of identity in a geographically structured population , 1977, Advances in Applied Probability.

[21]  J. Pella,et al.  Bayesian methods for analysis of stock mixtures from genetic characters , 2001 .

[22]  E. Heyer,et al.  Geographic Patterns of (Genetic, Morphologic, Linguistic) Variation: How Barriers Can Be Detected by Using Monmonier's Algorithm , 2004, Human biology.

[23]  P. Donnelly,et al.  Inference of population structure using multilocus genotype data. , 2000, Genetics.

[24]  Jukka Corander,et al.  BAPS 2: enhanced possibilities for the analysis of genetic population structure , 2004, Bioinform..

[25]  S. Wright THE INTERPRETATION OF POPULATION STRUCTURE BY F‐STATISTICS WITH SPECIAL REGARD TO SYSTEMS OF MATING , 1965 .

[26]  Matthew Stephens,et al.  Assigning African elephant DNA to geographic region of origin: applications to the ivory trade. , 2004, Proceedings of the National Academy of Sciences of the United States of America.

[27]  Rong‐Cai Yang ESTIMATING HIERARCHICAL F‐STATISTICS , 1998, Evolution; international journal of organic evolution.

[28]  Neil J. Anderson,et al.  Assessing population structure and gene flow in Montana wolverines (Gulo gulo) using assignment‐based approaches , 2003, Molecular ecology.

[29]  A. Doucet,et al.  Computational Advances for and from Bayesian Analysis , 2004 .

[30]  David P. Dobkin,et al.  The quickhull algorithm for convex hulls , 1996, TOMS.

[31]  A E Gelfand,et al.  Spatial modelling of multinomial data with latent structure: an application to geographical mapping of human gene and haplotype frequencies. , 2000, Biostatistics.

[32]  M. Kimura,et al.  The Stepping Stone Model of Population Structure and the Decrease of Genetic Correlation with Distance. , 1964, Genetics.

[33]  M. Sillanpää,et al.  Bayesian analysis of genetic differentiation between populations. , 2003, Genetics.