Testing for Neutrality in Samples With Sequencing Errors

Many data sets one could use for population genetics contain artifactual sites, i.e., sequencing errors. Here, we first explore the impact of such errors on several common summary statistics, assuming that sequencing errors are mostly singletons. We thus show that in the presence of those errors, estimators of θ can be strongly biased. We further show that even with a moderate number of sequencing errors, neutrality tests based on the frequency spectrum reject neutrality. This implies that analyses of data sets with such errors will systematically lead to wrong inferences of evolutionary scenarios. To avoid to these errors, we propose two new estimators of θ that ignore singletons as well as two new tests Y and Y* that can be used to test neutrality despite sequencing errors. All in all, we show that even though singletons are ignored, these new tests show some power to detect deviations from a standard neutral model. We therefore advise the use of these new tests to strengthen conclusions in suspicious data sets.

[1]  W. Gain Variation and Evolution. , 1893, Science.

[2]  J. M. Smith,et al.  The hitch-hiking effect of a favourable gene. , 1974, Genetical research.

[3]  G. A. Watterson On the number of segregating sites in genetical models without recombination. , 1975, Theoretical population biology.

[4]  R. Hudson Properties of a neutral allele model with intragenic recombination. , 1983, Theoretical population biology.

[5]  F. Tajima Evolutionary relationship of DNA sequences in finite populations. , 1983, Genetics.

[6]  S. Tavaré,et al.  Line-of-descent and genealogical processes, and their applications in population genetics models. , 1984, Theoretical population biology.

[7]  F. Tajima Statistical method for testing the neutral mutation hypothesis by DNA polymorphism. , 1989, Genetics.

[8]  R. Hudson Gene genealogies and the coalescent process. , 1990 .

[9]  Thomas Wiehe,et al.  The Effect of Strongly Selected Substitutions on Neutral Polymorphism: Analytical Results Based on Diffusion Theory , 1992 .

[10]  W. Li,et al.  Statistical tests of neutrality of mutations. , 1993, Genetics.

[11]  R. Berger,et al.  P Values Maximized Over a Confidence Set for the Nuisance Parameter , 1994 .

[12]  Y. Fu,et al.  Statistical properties of segregating sites. , 1995, Theoretical population biology.

[13]  G. Churchill,et al.  Properties of statistical tests of neutrality for DNA polymorphism data. , 1995, Genetics.

[14]  W Stephan,et al.  The hitchhiking effect on the site frequency spectrum of DNA polymorphisms. , 1995, Genetics.

[15]  Yun-Xin Fu,et al.  New statistical tests of neutrality for DNA samples from a population. , 1996, Genetics.

[16]  Y. Fu,et al.  Statistical tests of neutrality of mutations against population growth, hitchhiking and background selection. , 1997, Genetics.

[17]  A. Eyre-Walker,et al.  Investigation of the bottleneck leading to the domestication of maize. , 1998, Proceedings of the National Academy of Sciences of the United States of America.

[18]  F. Depaulis,et al.  Neutrality tests based on the distribution of haplotypes under an infinite-site model. , 1998, Molecular biology and evolution.

[19]  Jeffrey D. Wall,et al.  Recombination and the power of statistical tests of neutrality , 1999 .

[20]  B. Dujon,et al.  Genomic Exploration of the Hemiascomycetous Yeasts: 1. A set of yeast species for molecular evolution studies 1 , 2000, FEBS letters.

[21]  Justin C. Fay,et al.  Hitchhiking under positive Darwinian selection. , 2000, Genetics.

[22]  M. Daly,et al.  A map of human genome sequence variation containing 1.42 million single nucleotide polymorphisms , 2001, Nature.

[23]  B. Gaut,et al.  Molecular evolution of the wound-induced serine protease inhibitor wip1 in Zea and related genera. , 2001, Molecular biology and evolution.

[24]  S. Tavaré,et al.  On a test of Depaulis and Veuille. , 2001, Molecular biology and evolution.

[25]  J. Wall,et al.  Coalescent simulations and statistical tests of neutrality. , 2001, Molecular biology and evolution.

[26]  F. Depaulis,et al.  Haplotype tests using coalescent simulations conditional on the number of segregating sites. , 2001, Molecular biology and evolution.

[27]  W. Stephan,et al.  Detecting a local signature of genetic hitchhiking along a recombining chromosome. , 2002, Genetics.

[28]  John W. Mellors,et al.  New Real-Time Reverse Transcriptase-Initiated PCR Assay with Single-Copy Sensitivity for Human Immunodeficiency Virus Type 1 RNA in Plasma , 2003, Journal of Clinical Microbiology.

[29]  M. Nordborg,et al.  The pattern of polymorphism on human chromosome 21. , 2002, Genome research.

[30]  S. Sampling theory for neutral alleles in a varying environment , 2003 .

[31]  O. White,et al.  Environmental Genome Shotgun Sequencing of the Sargasso Sea , 2004, Science.

[32]  G Achaz,et al.  A robust measure of HIV-1 population turnover within chronically infected individuals. , 2004, Molecular biology and evolution.

[33]  Molly Przeworski,et al.  How reliable are empirical genomic scans for selective sweeps? , 2006, Genome research.

[34]  Adrian W. Briggs,et al.  Analysis of one million base pairs of Neanderthal DNA , 2006, Nature.

[35]  M. Ronaghi,et al.  Characterization of mutation spectra with ultra-deep pyrosequencing: application to HIV-1 drug resistance. , 2007, Genome research.

[36]  Bjarne Knudsen,et al.  Incorporating Experimental Design and Error Into Coalescent/Mutation Models of Population History , 2007, Genetics.

[37]  Philip L. F. Johnson,et al.  Accounting for bias from sequencing error in population genetic estimates. , 2007, Molecular biology and evolution.