Contamination detection in sequencing studies using the mitochondrial phylogeny

Within-species contamination is a major issue in sequencing studies, especially for mitochondrial studies. Contamination can be detected by analyzing the nuclear genome or by inspecting polymorphic sites in the mitochondrial genome (mtDNA). Existing methods using the nuclear genome are computationally expensive, and no appropriate tool for detecting sample contamination in large-scale mtDNA data sets is available. Here we present haplocheck, a tool that requires only the mtDNA to detect contamination in both targeted mitochondrial and whole-genome sequencing studies. Our in silico simulations and amplicon mixture experiments indicate that haplocheck detects mtDNA contamination accurately and is independent of the phylogenetic distance within a sample mixture. By applying haplocheck to The 1000 Genomes Project Consortium data, we further evaluate the application of haplocheck as a fast proxy tool for nDNA-based contamination detection using the mtDNA and identify the mitochondrial copy number within a mixture as a critical component for the overall accuracy. The haplocheck tool is available both as a command-line tool and as a cloud web service producing interactive reports that facilitates the navigation through the phylogeny of contaminated samples.

[1]  A. Need,et al.  Nuclear-mitochondrial DNA segments resemble paternally inherited mitochondrial DNA in humans , 2020, Nature Communications.

[2]  H. Bandelt,et al.  Extraordinary claims require extraordinary evidence in asserted mtDNA biparental inheritance , 2020, Forensic Science International: Genetics.

[3]  Derek Huntley,et al.  NUMT Confounding Biases Mitochondrial Heteroplasmy Calls in Favor of the Reference Allele , 2019, Front. Cell Dev. Biol..

[4]  F. Kronenberg,et al.  Mitochondrial DNA copy number is associated with mortality and infections in a large cohort of patients with chronic kidney disease. , 2019, Kidney international.

[5]  Hushan Yang,et al.  An Effective Strategy to Eliminate Inherent Cross-Contamination in mtDNA Next-Generation Sequencing of Multiple Samples. , 2019, The Journal of molecular diagnostics : JMD.

[6]  D. Balciunas,et al.  A Nuclear mtDNA Concatemer (Mega-NUMT) Could Mimic Paternal Inheritance of Mitochondrial Genome , 2019, Frontiers in Genetics.

[7]  David L. Bennett,et al.  Germline selection shapes human mitochondrial DNA diversity , 2019, Science.

[8]  Francesco Vezzi,et al.  Index hopping on the Illumina HiseqX platform and its consequences for ancient DNA studies , 2018, Molecular ecology resources.

[9]  R. Just,et al.  Validation of NGS for mitochondrial DNA casework at the FBI Laboratory. , 2019, Forensic science international. Genetics.

[10]  Fan Zhang,et al.  Ancestry-agnostic estimation of DNA sample contamination from sequence reads , 2018, bioRxiv.

[11]  J. Krause,et al.  Ratio of mitochondrial to nuclear DNA affects contamination estimates in ancient DNA analysis , 2018, Scientific Reports.

[12]  Marius van den Beek,et al.  The Galaxy platform for accessible, reproducible and collaborative biomedical analyses: 2018 update , 2018, Nucleic Acids Res..

[13]  Z. Gu,et al.  Independent impacts of aging on mitochondrial DNA quantity and quality in humans , 2017, BMC Genomics.

[14]  Jordan M. Eizenga,et al.  A phylogenetic approach for haplotype analysis of sequence data from complex mitochondrial mixtures. , 2017, Forensic science international. Genetics.

[15]  Nuno A. Fonseca,et al.  Comprehensive molecular characterization of mitochondrial genomes in human cancers , 2017, bioRxiv.

[16]  I. Pavlidis,et al.  The ability of human nuclear DNA to cause false positive low-abundance heteroplasmy calls varies across the mitochondrial genome , 2016, BMC Genomics.

[17]  Alan M. Kwong,et al.  Next-generation genotype imputation service and methods , 2016, Nature Genetics.

[18]  Günther Specht,et al.  mtDNA-Server: next-generation sequencing data analysis of human mitochondrial DNA in the cloud , 2016, Nucleic Acids Res..

[19]  Hans-Jürgen Bandelt,et al.  HaploGrep 2: mitochondrial haplogroup classification in the era of high-throughput sequencing , 2016, Nucleic Acids Res..

[20]  G. Renaud,et al.  Schmutzi: estimation of contamination and endogenous mitochondrial consensus calling for ancient DNA , 2015, Genome Biology.

[21]  Gabor T. Marth,et al.  A global reference for human genetic variation , 2015, Nature.

[22]  Rebecca S. Just,et al.  Mitochondrial DNA heteroplasmy in the emerging field of massively parallel sequencing , 2015, Forensic science international. Genetics.

[23]  Gonçalo R. Abecasis,et al.  Assessing Mitochondrial DNA Variation and Copy Number in Lymphocytes of ~2,000 Sardinians Using Tailored Sequencing Analysis Tools , 2015, PLoS genetics.

[24]  M. Stoneking,et al.  Extensive tissue-related and allele-related mtDNA heteroplasmy suggests positive selection for somatic mutations , 2015, Proceedings of the National Academy of Sciences.

[25]  Walther Parson,et al.  Questioning the prevalence and reliability of human mitochondrial DNA heteroplasmy from massively parallel sequencing data , 2014, Proceedings of the National Academy of Sciences.

[26]  Ryan E. Mills,et al.  The genomic landscape of polymorphic human nuclear mitochondrial insertions , 2014, bioRxiv.

[27]  Jian Lu,et al.  Extensive pathogenicity of mitochondrial heteroplasmy in healthy human individuals , 2014, Proceedings of the National Academy of Sciences.

[28]  Anton Nekrutenko,et al.  Controlling for contamination in re-sequencing studies with a reproducible web-based phylogenetic approach. , 2014, BioTechniques.

[29]  Jiang Li,et al.  Finding the lost treasures in exome sequencing data. , 2013, Trends in genetics : TIG.

[30]  Philip L. F. Johnson,et al.  A Revised Timescale for Human Evolution Based on Ancient Mitochondrial Genomes , 2013, Current Biology.

[31]  G. Abecasis,et al.  Detecting and estimating contamination of human DNA samples in sequencing and array-based genotype data. , 2012, American journal of human genetics.

[32]  Eitan Rubin,et al.  Mitochondrial DNA heteroplasmy in diabetes and normal adults: role of acquired and inherited mutational patterns in twins. , 2012, Human molecular genetics.

[33]  Günther Specht,et al.  Cloudgene: A graphical execution platform for MapReduce programs on private and public clouds , 2012, BMC Bioinformatics.

[34]  M. Stoneking,et al.  Fidelity of capture-enrichment for mtDNA genome sequencing: influence of NUMTs , 2012, Nucleic acids research.

[35]  Leping Li,et al.  ART: a next-generation sequencing read simulator , 2012, Bioinform..

[36]  Hans-Jürgen Bandelt,et al.  Current next generation sequencing technology may not meet forensic standards. , 2012, Forensic science international. Genetics.

[37]  Anton Nekrutenko,et al.  Dynamics of mitochondrial heteroplasmy in three families investigated via a repeatable re-sequencing study , 2011, Genome Biology.

[38]  Gonçalo R. Abecasis,et al.  The variant call format and VCFtools , 2011, Bioinform..

[39]  Günther Specht,et al.  HaploGrep: a fast and reliable algorithm for automatic classification of mitochondrial DNA haplogroups , 2011, Human mutation.

[40]  Mark Stoneking,et al.  Detecting heteroplasmy from high-throughput sequencing of complete human mitochondrial DNA genomes. , 2010, American journal of human genetics.

[41]  D. Dressman,et al.  Heteroplasmic mitochondrial DNA mutations in normal and tumor cells , 2010, Nature.

[42]  Manfred Kayser,et al.  Updated comprehensive phylogenetic tree of global human mitochondrial DNA variation , 2009, Human mutation.

[43]  H. Bandelt,et al.  External Contamination in Single Cell mtDNA Analysis , 2007, PloS one.

[44]  M. Metspalu,et al.  The World mtDNA Phylogeny , 2006 .

[45]  Hans-Jürgen Bandelt,et al.  A Critical Reassessment of the Role of Mitochondria in Tumorigenesis , 2005, PLoS medicine.

[46]  A. Salas,et al.  Artificial recombination in forensic mtDNA population databases , 2004, International Journal of Legal Medicine.

[47]  D. Turnbull,et al.  Reanalysis and revision of the Cambridge reference sequence for human mitochondrial DNA , 1999, Nature Genetics.