Relative performance of Bayesian clustering software for inferring population substructure and individual assignment at low levels of population differentiation

Traditional methods for characterizing genetic differentiation among populations rely on a priori grouping of individuals. Bayesian clustering methods avoid this limitation by using linkage and Hardy–Weinberg disequilibrium to decompose a sample of individuals into genetically distinct groups. There are several software programs available for Bayesian clustering analyses, all of which describe a decrease in the ability to detect distinct clusters as levels of genetic differentiation among populations decrease. However, no study has yet compared the performance of such methods at low levels of population differentiation, which may be common in species where populations have experienced recent separation or high levels of gene flow. We used simulated data to evaluate the performance of three Bayesian clustering software programs, PARTITION, STRUCTURE, and BAPS, at levels of population differentiation below FST=0.1. PARTITION was unable to correctly identify the number of subpopulations until levels of FST reached around 0.09. Both STRUCTURE and BAPS performed very well at low levels of population differentiation, and were able to correctly identify the number of subpopulations at FST around 0.03. The average proportion of an individual’s genome assigned to its true population of origin increased with increasing FST for both programs, reaching over 92% at an FST of 0.05. The average number of misassignments (assignments to the incorrect subpopulation) continued to decrease as FST increased, and when FST was 0.05, fewer than 3% of individuals were misassigned using either program. Both STRUCTURE and BAPS worked extremely well for inferring the number of clusters when clusters were not well-differentiated (FST=0.02–0.03), but our results suggest that FST must be at least 0.05 to reach an assignment accuracy of greater than 97%.

[1]  H. Noyes,et al.  Genetic identification of two sibling species of Lutzomyia longipalpis (Diptera: Psychodidae) that produce distinct male sex pheromones in Sobral, Ceará State, Brazil , 2003, Molecular ecology.

[2]  M. Stephens,et al.  Inference of population structure using multilocus genotype data: dominant markers and null alleles , 2007, Molecular ecology notes.

[3]  G. Evanno,et al.  Detecting the number of clusters of individuals using the software structure: a simulation study , 2005, Molecular ecology.

[4]  P. Hedrick PERSPECTIVE: HIGHLY VARIABLE LOCI AND THEIR INTERPRETATION IN EVOLUTION AND CONSERVATION , 1999, Evolution; international journal of organic evolution.

[5]  K J Dawson,et al.  A Bayesian approach to the identification of panmictic populations and the assignment of individuals. , 2001, Genetical research.

[6]  L. Bernatchez,et al.  Individual assignment test reveals differential restriction to dispersal between two salmonids despite no increase of genetic differences with distance , 2004, Molecular ecology.

[7]  I. Stirling,et al.  Microsatellite analysis of population structure in Canadian polar bears , 1995, Molecular ecology.

[8]  L. Cavalli-Sforza,et al.  High resolution of human evolutionary trees with polymorphic microsatellites , 1994, Nature.

[9]  C. Strobeck,et al.  Genetic structure of North American wolverine (Gulo gulo) populations , 2001, Molecular ecology.

[10]  R. Fletcher Practical Methods of Optimization , 1988 .

[11]  P. Hedrick A STANDARDIZED GENETIC DIFFERENTIATION MEASURE , 2005, Evolution; international journal of organic evolution.

[12]  M. Stephens,et al.  Inference of population structure using multilocus genotype data: linked loci and correlated allele frequencies. , 2003, Genetics.

[13]  J. Glaubitz convert: A user‐friendly program to reformat diploid genotypic data for commonly used population genetic software packages , 2004 .

[14]  Stephanie Manel,et al.  Assignment methods: matching biological questions with appropriate techniques. , 2005, Trends in ecology & evolution.

[15]  Jorma Piironen,et al.  The one that did not get away: individual assignment using microsatellite data detects a case of fishing competition fraud , 2000, Proceedings of the Royal Society of London. Series B: Biological Sciences.

[16]  O. Rhodes,et al.  Assessing Hybridization in Wildlife Populations Using Molecular Markers: A Case Study in Wild Turkeys , 2006 .

[17]  M. Beaumont,et al.  Genetic identification of wild and domestic cats (Felis silvestris) and their hybrids using Bayesian clustering methods. , 2001, Molecular biology and evolution.

[18]  M. Nei,et al.  F‐statistics and analysis of gene diversity in subdivided populations , 1977, Annals of human genetics.

[19]  M W Bruford,et al.  Genetic diversity and introgression in the Scottish wildcat , 2001, Molecular ecology.

[20]  Jukka Corander,et al.  BAPS 2: enhanced possibilities for the analysis of genetic population structure , 2004, Bioinform..

[21]  J. Mank,et al.  Individual organisms as units of analysis: Bayesian-clustering alternatives in population genetics. , 2004, Genetical research.

[22]  M. Sillanpää,et al.  Bayesian analysis of genetic differentiation between populations. , 2003, Genetics.

[23]  G. Luikart,et al.  Detecting Wildlife Poaching: Identifying the Origin of Individuals with Bayesian Assignment Tests and Multilocus Genotypes , 2002 .

[24]  Pekka Marttinen,et al.  A Bayesian method for identification of stock mixtures from molecular marker data , 2006 .

[25]  P. Donnelly,et al.  Inference of population structure using multilocus genotype data. , 2000, Genetics.