PoPoolation: A Toolbox for Population Genetic Analysis of Next Generation Sequencing Data from Pooled Individuals

Recent statistical analyses suggest that sequencing of pooled samples provides a cost effective approach to determine genome-wide population genetic parameters. Here we introduce PoPoolation, a toolbox specifically designed for the population genetic analysis of sequence data from pooled individuals. PoPoolation calculates estimates of θ Watterson, θ π, and Tajima's D that account for the bias introduced by pooling and sequencing errors, as well as divergence between species. Results of genome-wide analyses can be graphically displayed in a sliding window plot. PoPoolation is written in Perl and R and it builds on commonly used data formats. Its source code can be downloaded from http://code.google.com/p/popoolation/. Furthermore, we evaluate the influence of mapping algorithms, sequencing errors, and read coverage on the accuracy of population genetic parameter estimates from pooled data.

[1]  A. Futschik,et al.  The Next Generation of Molecular Markers From Massively Parallel Sequencing of Pooled DNA Samples , 2010, Genetics.

[2]  Kai Wang,et al.  A probabilistic framework for aligning paired-end RNA-seq data , 2010, Bioinform..

[3]  K. Lindblad-Toh,et al.  Whole-genome resequencing reveals loci under selection during chicken domestication , 2010, Nature.

[4]  Tina T. Hu,et al.  Population resequencing reveals local adaptation of Arabidopsis lyrata to serpentine soils , 2010, Nature Genetics.

[5]  M. Lynch,et al.  mlRho – a program for estimating the population mutation and recombination rates from shotgun‐sequenced diploid genomes , 2010, Molecular ecology.

[6]  A. Futschik,et al.  MASSIVELY PARALLEL SEQUENCING OF POOLED DNA SAMPLES-THE NEXT GENERATION OF MOLECULAR MARKERS , 2010 .

[7]  Detlef Weigel,et al.  Deep sequencing to reveal new variants in pooled DNA samples , 2009, Human mutation.

[8]  Andrew G. Clark,et al.  Population Genomic Inferences from Sparse High-Throughput Sequencing of Two Populations of Drosophila melanogaster , 2009, Genome biology and evolution.

[9]  Inanç Birol,et al.  De novo transcriptome assembly with ABySS , 2009, Bioinform..

[10]  John C. Marioni,et al.  Effect of read-mapping biases on detecting allele-specific expression from RNA-sequencing data , 2009, Bioinform..

[11]  Ken Chen,et al.  VarScan: variant detection in massively parallel sequencing of individual and pooled samples , 2009, Bioinform..

[12]  Gonçalo R. Abecasis,et al.  The Sequence Alignment/Map format and SAMtools , 2009, Bioinform..

[13]  Richard Durbin,et al.  Sequence analysis Fast and accurate short read alignment with Burrows – Wheeler transform , 2009 .

[14]  Lior Pachter,et al.  Sequence Analysis , 2020, Definitions.

[15]  Justin C. Fay,et al.  Quantification of rare allelic variants from pooled genomic DNA , 2009, Nature Methods.

[16]  David Osumi-Sutherland,et al.  FlyBase: enhancing Drosophila Gene Ontology annotations , 2008, Nucleic Acids Res..

[17]  W. J. Kent,et al.  The UCSC Genome Browser , 2003, Current protocols in bioinformatics.

[18]  Raymond K. Auerbach,et al.  PeakSeq enables systematic scoring of ChIP-seq experiments relative to controls , 2009, Nature Biotechnology.

[19]  R. Durbin,et al.  Mapping Quality Scores Mapping Short Dna Sequencing Reads and Calling Variants Using P

, 2022 .

[20]  Clifford A. Meyer,et al.  Model-based Analysis of ChIP-Seq (MACS) , 2008, Genome Biology.

[21]  Juliane C. Dohm,et al.  Substantial biases in ultra-short read data sets from high-throughput DNA sequencing , 2008, Nucleic acids research.

[22]  E. Birney,et al.  Velvet: algorithms for de novo short read assembly using de Bruijn graphs. , 2008, Genome research.

[23]  G. Marth,et al.  Pyrobayes: an improved base caller for SNP discovery in pyrosequences , 2008, Nature Methods.

[24]  W. Stephan,et al.  Distinctly Different Sex Ratios in African and European Populations of Drosophila melanogaster Inferred From Chromosomewide Single Nucleotide Polymorphism Data , 2007, Genetics.

[25]  F. Blattner,et al.  Mauve: multiple alignment of conserved genomic sequence with rearrangements. , 2004, Genome research.

[26]  Richard R. Hudson,et al.  Generating samples under a Wright-Fisher neutral model of genetic variation , 2002, Bioinform..

[27]  B. Charlesworth,et al.  The effect of deleterious mutations on neutral molecular variation. , 1993, Genetics.

[28]  C. Aquadro,et al.  Levels of naturally occurring DNA polymorphism correlate with recombination rates in D. melanogaster , 1992, Nature.

[29]  M. Kreitman,et al.  Adaptive protein evolution at the Adh locus in Drosophila , 1991, Nature.

[30]  F. Tajima Statistical method for testing the neutral mutation hypothesis by DNA polymorphism. , 1989, Genetics.

[31]  R. Hudson,et al.  A test of neutral molecular evolution based on nucleotide data. , 1987, Genetics.