Population-Genetic Inference from Pooled-Sequencing Data

Although pooled-population sequencing has become a widely used approach for estimating allele frequencies, most work has proceeded in the absence of a proper statistical framework. We introduce a self-sufficient, closed-form, maximum-likelihood estimator for allele frequencies that accounts for errors associated with sequencing, and a likelihood-ratio test statistic that provides a simple means for evaluating the null hypothesis of monomorphism. Unbiased estimates of allele frequencies (where N is the number of individuals sampled) appear to be unachievable, and near-certain identification of a polymorphism requires a minor-allele frequency . A framework is provided for testing for significant differences in allele frequencies between populations, taking into account sampling at the levels of individuals within populations and sequences within pooled samples. Analyses that fail to account for the two tiers of sampling suffer from very large false-positive rates and can become increasingly misleading with increasing depths of sequence coverage. The power to detect significant allele-frequency differences between two populations is very limited unless both the number of sampled individuals and depth of sequencing coverage exceed 100.

[1]  Robert D Schnabel,et al.  SNP discovery and allele frequency estimation by deep sequencing of reduced representation libraries , 2008, Nature Methods.

[2]  Gonçalo R. Abecasis,et al.  The Sequence Alignment/Map format and SAMtools , 2009, Bioinform..

[3]  Kevin R. Thornton,et al.  Genome-wide analysis of a long-term evolution experiment with Drosophila , 2010, Nature.

[4]  A. Futschik,et al.  The Next Generation of Molecular Markers From Massively Parallel Sequencing of Pooled DNA Samples , 2010, Genetics.

[5]  D. Cutler,et al.  To Pool, or Not to Pool? , 2010, Genetics.

[6]  Aaron M. Tarone,et al.  Population-Based Resequencing of Experimentally Evolved Populations Reveals the Genetic Basis of Body Size Variation in Drosophila melanogaster , 2011, PLoS genetics.

[7]  A. Futschik,et al.  PoPoolation: A Toolbox for Population Genetic Analysis of Next Generation Sequencing Data from Pooled Individuals , 2011, PloS one.

[8]  Bryan D. Kolaczkowski,et al.  Genomic Differentiation Between Temperate and Tropical Australian Populations of Drosophila melanogaster , 2011, Genetics.

[9]  Dmitri A. Petrov,et al.  Empirical Validation of Pooled Whole Genome Population Re-Sequencing in Drosophila melanogaster , 2012, PloS one.

[10]  Thomas L. Turner,et al.  Investigating Natural Variation in Drosophila Courtship Song by the Evolve and Resequence Approach , 2012, Genetics.

[11]  Nigel F. Delaney,et al.  FREQ-Seq: A Rapid, Cost-Effective, Sequencing-Based Method to Determine Allele Frequencies Directly from Mixed Populations , 2012, PloS one.

[12]  Kevin R. Thornton,et al.  The Drosophila melanogaster Genetic Reference Panel , 2012, Nature.

[13]  A. Futschik,et al.  Detecting Selective Sweeps from Pooled Next-Generation Sequencing Samples , 2012, Molecular biology and evolution.

[14]  M. Grabherr,et al.  Population-scale sequencing reveals genetic differentiation due to local adaptation in Atlantic herring , 2012, Proceedings of the National Academy of Sciences.

[15]  T. Cezard,et al.  Estimation of population allele frequencies from next‐generation sequencing data: pool‐versus individual‐based genotyping , 2013, Molecular ecology.

[16]  M. Pérez-Enciso,et al.  Population genomics from pool sequencing , 2013, Molecular ecology.

[17]  A. Futschik,et al.  Pool-hmm: a Python program for estimating the allele frequency spectrum and detecting selective sweeps from next generation sequencing of pooled samples , 2013, Molecular ecology resources.

[18]  Eran Halperin,et al.  Rare Variant Association Testing Under Low-Coverage Sequencing , 2013, Genetics.

[19]  W. Babik,et al.  Accuracy of allele frequency estimation using pooled RNA‐Seq , 2014, Molecular ecology resources.

[20]  Roy Kishony,et al.  Genetic variation of a bacterial pathogen within individuals with cystic fibrosis provides a record of selective pressures , 2013, Nature Genetics.