Assessing statistical power of SNPs for population structure and conservation studies

Single nucleotide polymorphisms (SNPs) have been proposed by some as the new frontier for population studies, and several papers have presented theoretical and empirical evidence reporting the advantages and limitations of SNPs. As a practical matter, however, it remains unclear how many SNP markers will be required or what the optimal characteristics of those markers should be in order to obtain sufficient statistical power to detect different levels of population differentiation. We use a hypothetical case to illustrate the process of designing a population genetics project, and present results from simulations that address several issues for maximizing statistical power to detect differentiation while minimizing the amount of effort in developing SNPs. Results indicate that (i) while ~30 SNPs should be sufficient to detect moderate (FST = 0.01) levels of differentiation, studies aimed at detecting demographic independence (e.g. FST < 0.005) may require 80 or more SNPs and large sample sizes; (ii) different SNP allele frequencies have little affect on power, and thus, selection of SNPs can be relatively unbiased; (iii) increasing the sample size has a strong effect on power, so that the number of loci can be minimized when sample number is known, and increasing sample size is almost always beneficial; and (iv) power is increased by including multiple SNPs within loci and inferring haplotypes, rather than trying to use only unlinked SNPs. This also has the practical benefit of reducing the SNP ascertainment effort, and may influence the decision of whether to seek SNPs in coding or noncoding regions.

[1]  S. Kalinowski,et al.  How many alleles per locus should be used to estimate genetic distances? , 2002, Heredity.

[2]  S. Kalinowski,et al.  Do polymorphic loci require large sample sizes to estimate genetic distances? , 2005, Heredity.

[3]  G. Luikart,et al.  SNPs in ecology, evolution and conservation , 2004 .

[4]  Laurent Excoffier,et al.  Arlequin (version 3.0): An integrated software package for population genetics data analysis , 2005, Evolutionary bioinformatics online.

[5]  Simon Easteal,et al.  Number of SNPS Loci Needed to Detect Population Structure , 2003, Human Heredity.

[6]  B. Taylor,et al.  First policy then science: why a management unit based solely on genetic criteria cannot work , 1999, Molecular ecology.

[7]  S. Liu-Cordero,et al.  The discovery of single-nucleotide polymorphisms--and inferences about human demographic history. , 2001, American journal of human genetics.

[8]  P. Donnelly,et al.  A new statistical method for haplotype reconstruction from population data. , 2001, American journal of human genetics.

[9]  H. Ellegren,et al.  Genomics of natural bird populations: a gene‐based set of reference markers evenly spread across the avian genome , 2007, Molecular ecology.

[10]  P. Taberlet,et al.  The power and promise of population genomics: from genotyping to genome typing , 2003, Nature Reviews Genetics.

[11]  M. Krawczak Informativity assessment for biallelic single nucleotide polymorphisms , 1999, Electrophoresis.

[12]  L. Seeb,et al.  Use of sequence data from rainbow trout and Atlantic salmon for SNP detection in Pacific salmon , 2005, Molecular ecology.

[13]  L. Seeb,et al.  Number of Alleles as a Predictor of the Relative Assignment Accuracy of Short Tandem Repeat (STR) and Single‐Nucleotide‐Polymorphism (SNP) Baselines for Chum Salmon , 2008 .

[14]  R. Ward,et al.  Informativeness of genetic markers for inference of ancestry. , 2003, American journal of human genetics.

[15]  C. Moritz Defining 'Evolutionarily Significant Units' for conservation. , 1994, Trends in ecology & evolution.

[16]  D. Nickerson,et al.  The utility of single nucleotide polymorphisms in inferences of population history , 2003 .

[17]  R. Wayne,et al.  Conservation genetics in the new molecular age , 2004 .

[18]  Stefan Palm,et al.  POWSIM: a computer program for assessing statistical power when testing for genetic differentiation , 2006 .

[19]  E. Anderson,et al.  The Power of Single-Nucleotide Polymorphisms for Large-Scale Parentage Inference , 2006, Genetics.

[20]  B. Taylor,et al.  The need to estimate power to link genetics and demography for conservation , 1996 .

[21]  L. Seeb,et al.  Impacts of Marker Class Bias Relative to Locus-Specific Variability on Population Inferences in Chinook Salmon: A Comparison of Single-Nucleotide Polymorphisms with Short Tandem Repeats and Allozymes , 2007 .

[22]  John Novembre,et al.  Ascertainment bias in spatially structured populations: a case study in the eastern fence lizard. , 2007, The Journal of heredity.

[23]  Hongyu Zhao,et al.  Comparison of single-nucleotide polymorphisms and microsatellites in inference of population structure , 2005, BMC Genetics.

[24]  E. Ostrander,et al.  SNPs in ecological and conservation studies: a test in the Scandinavian wolf population , 2005, Molecular ecology.

[25]  D N Stivers,et al.  The utility of short tandem repeat loci beyond human identification: Implications for development of new DNA typing systems , 1999, Electrophoresis.

[26]  A. Elz,et al.  Differentiating salmon populations at broad and fine geographical scales with microsatellites and single nucleotide polymorphisms , 2008, Molecular ecology.

[27]  S. Mesnick,et al.  Characterization of 18 SNP markers for sperm whale (Physeter macrocephalus) , 2007 .

[28]  N. Aitken,et al.  Single nucleotide polymorphism (SNP) discovery in mammals: a targeted‐gene approach , 2004, Molecular ecology.

[29]  G. Carvalho,et al.  Power for detecting genetic divergence: differences between statistical methods and marker loci , 2006, Molecular ecology.

[30]  Michael W. Mahoney,et al.  PCA-Correlated SNPs for Structure Identification in Worldwide Human Populations , 2007, PLoS genetics.

[31]  F. Allendorf,et al.  Identification of management units using population genetic data. , 2007, Trends in ecology & evolution.

[32]  Peter Donnelly,et al.  A comparison of bayesian methods for haplotype reconstruction from population genotype data. , 2003, American journal of human genetics.

[33]  Manfred Kayser,et al.  Proportioning whole-genome single-nucleotide-polymorphism diversity for the identification of geographic population structure and genetic ancestry. , 2006, American journal of human genetics.

[34]  P. Morin,et al.  Highly accurate SNP genotyping from historical and low‐quality samples , 2007 .