Controlling for contamination in re-sequencing studies with a reproducible web-based phylogenetic approach.

Polymorphism discovery is a routine application of next-generation sequencing technology where multiple samples are sent to a service provider for library preparation, subsequent sequencing, and bioinformatic analyses. The decreasing cost and advances in multiplexing approaches have made it possible to analyze hundreds of samples at a reasonable cost. However, because of the manual steps involved in the initial processing of samples and handling of sequencing equipment, cross-contamination remains a significant challenge. It is especially problematic in cases where polymorphism frequencies do not adhere to diploid expectation, for example, heterogeneous tumor samples, organellar genomes, as well as during bacterial and viral sequencing. In these instances, low levels of contamination may be readily mistaken for polymorphisms, leading to false results. Here we describe practical steps designed to reliably detect contamination and uncover its origin, and also provide new, Galaxy-based, readily accessible computational tools and workflows for quality control. All results described in this report can be reproduced interactively on the web as described at http://usegalaxy.org/contamination.

[1]  P. Frachon,et al.  Organization and dynamics of human mitochondrial DNA , 2004, Journal of Cell Science.

[2]  E. Shoubridge,et al.  Random genetic drift in the female germline explains the rapid segregation of mammalian mitochondrial DNA , 1996, Nature Genetics.

[3]  Pierre Baldi,et al.  An enhanced MITOMAP with a global mtDNA mutational phylogeny , 2006, Nucleic Acids Res..

[4]  Anton Nekrutenko,et al.  Dynamics of mitochondrial heteroplasmy in three families investigated via a repeatable re-sequencing study , 2011, Genome Biology.

[5]  James Taylor,et al.  Next-generation sequencing data interpretation: enhancing reproducibility and accessibility , 2012, Nature Reviews Genetics.

[6]  B. Freeman,et al.  DNA from Buccal Swabs Recruited by Mail: Evaluation of Storage Effects on Long-Term Stability and Suitability for Multiplex Polymerase Chain Reaction Genotyping , 2003, Behavior genetics.

[7]  Saharon Rosset,et al.  A "Copernican" reassessment of the human mitochondrial DNA tree from its root. , 2012, American journal of human genetics.

[8]  S. Dimauro,et al.  Mitochondrial diseases. , 1989, Neurologic clinics.

[9]  D. Turnbull,et al.  The inheritance of mitochondrial DNA heteroplasmy: random drift, selection or both? , 2000, Trends in genetics : TIG.

[10]  T. Ozawa,et al.  Automated sequencing of mitochondrial DNA. , 1996, Methods in enzymology.

[11]  Manfred Kayser,et al.  Updated comprehensive phylogenetic tree of global human mitochondrial DNA variation , 2009, Human mutation.

[12]  Mark Stoneking,et al.  A new approach for detecting low-level mutations in next-generation sequence data , 2012, Genome Biology.

[13]  Korbinian Strimmer,et al.  APE: Analyses of Phylogenetics and Evolution in R language , 2004, Bioinform..

[14]  David C Samuels,et al.  A reduction of mitochondrial DNA molecules during embryogenesis explains the rapid segregation of genotypes , 2008, Nature Genetics.

[15]  Eitan Rubin,et al.  Mitochondrial DNA heteroplasmy in diabetes and normal adults: role of acquired and inherited mutational patterns in twins. , 2012, Human molecular genetics.

[16]  H. Jacobs Making mitochondrial mutants. , 2001, Trends in genetics : TIG.

[17]  N. Saitou,et al.  The neighbor-joining method: a new method for reconstructing phylogenetic trees. , 1987, Molecular biology and evolution.

[18]  P. Chinnery,et al.  The inheritance of pathogenic mitochondrial DNA mutations , 2009, Biochimica et biophysica acta.

[19]  D. Turnbull,et al.  The pedigree rate of sequence divergence in the human mitochondrial genome: there is a difference between phylogenetic and pedigree rates. , 2003, American journal of human genetics.

[20]  E. Shoubridge,et al.  The mitochondrial DNA genetic bottleneck results from replication of a subpopulation of genomes , 2008, Nature Genetics.

[21]  Mark Stoneking,et al.  Detecting heteroplasmy from high-throughput sequencing of complete human mitochondrial DNA genomes. , 2010, American journal of human genetics.

[22]  D. Bogenhagen Mitochondrial DNA nucleoid structure. , 2012, Biochimica et biophysica acta.

[23]  Jeet Sukumaran,et al.  DendroPy: a Python library for phylogenetic computing , 2010, Bioinform..

[24]  Takahiko Hara,et al.  The mitochondrial bottleneck occurs without reduction of mtDNA content in female mouse germ cells , 2007, Nature Genetics.

[25]  Günther Specht,et al.  HaploGrep: a fast and reliable algorithm for automatic classification of mitochondrial DNA haplogroups , 2011, Human mutation.