mtDNA-Server: next-generation sequencing data analysis of human mitochondrial DNA in the cloud

Next generation sequencing (NGS) allows investigating mitochondrial DNA (mtDNA) characteristics such as heteroplasmy (i.e. intra-individual sequence variation) to a higher level of detail. While several pipelines for analyzing heteroplasmies exist, issues in usability, accuracy of results and interpreting final data limit their usage. Here we present mtDNA-Server, a scalable web server for the analysis of mtDNA studies of any size with a special focus on usability as well as reliable identification and quantification of heteroplasmic variants. The mtDNA-Server workflow includes parallel read alignment, heteroplasmy detection, artefact or contamination identification, variant annotation as well as several quality control metrics, often neglected in current mtDNA NGS studies. All computational steps are parallelized with Hadoop MapReduce and executed graphically with Cloudgene. We validated the underlying heteroplasmy and contamination detection model by generating four artificial sample mix-ups on two different NGS devices. Our evaluation data shows that mtDNA-Server detects heteroplasmies and artificial recombinations down to the 1% level with perfect specificity and outperforms existing approaches regarding sensitivity. mtDNA-Server is currently able to analyze the 1000G Phase 3 data (n = 2,504) in less than 5 h and is freely accessible at https://mtdna-server.uibk.ac.at.

[1]  D. Turnbull,et al.  Reanalysis and revision of the Cambridge reference sequence for human mitochondrial DNA , 1999, Nature Genetics.

[2]  Hans-Jürgen Bandelt,et al.  HaploGrep 2: mitochondrial haplogroup classification in the era of high-throughput sequencing , 2016, Nucleic Acids Res..

[3]  Walther Parson,et al.  Questioning the prevalence and reliability of human mitochondrial DNA heteroplasmy from massively parallel sequencing data , 2014, Proceedings of the National Academy of Sciences.

[4]  A. Wilm,et al.  LoFreq: a sequence-quality aware, ultra-sensitive variant caller for uncovering cell-population heterogeneity from high-throughput sequencing datasets , 2012, Nucleic acids research.

[5]  Jiang Li,et al.  The effect of strand bias in Illumina short-read sequencing data , 2012, BMC Genomics.

[6]  Ryan E. Mills,et al.  The genomic landscape of polymorphic human nuclear mitochondrial insertions , 2014, bioRxiv.

[7]  Saharon Rosset,et al.  A "Copernican" reassessment of the human mitochondrial DNA tree from its root. , 2012, American journal of human genetics.

[8]  Ernesto Picardi,et al.  MToolBox: a highly automated pipeline for heteroplasmy annotation and prioritization analysis of human mitochondrial variants in high-throughput sequencing , 2014, Bioinform..

[9]  Anton Nekrutenko,et al.  Controlling for contamination in re-sequencing studies with a reproducible web-based phylogenetic approach. , 2014, BioTechniques.

[10]  Jiang Li,et al.  MitoSeek: extracting mitochondria information and performing high-throughput mitochondria sequencing analysis , 2013, Bioinform..

[11]  Katarzyna Skonieczna,et al.  The landscape of mitochondrial DNA variation in human colorectal cancer on the background of phylogenetic knowledge. , 2012, Biochimica et biophysica acta.

[12]  Yan Guo,et al.  The use of next generation sequencing technology to study the effect of radiation therapy on mitochondrial DNA mutation. , 2012, Mutation research.

[13]  Laura C. Greaves,et al.  Mitochondrial DNA mutations in human disease , 2006, IUBMB life.

[14]  Hans-Jürgen Bandelt,et al.  Current next generation sequencing technology may not meet forensic standards. , 2012, Forensic science international. Genetics.

[15]  Marcella Attimonelli,et al.  A multi-parametric workflow for the prioritization of mitochondrial DNA variants of clinical interest , 2015, Human Genetics.

[16]  Sanjay Ghemawat,et al.  MapReduce: Simplified Data Processing on Large Clusters , 2004, OSDI.

[17]  Tal Nagar,et al.  MitoBamAnnotator: A web-based tool for detecting and annotating heteroplasmy in human mitochondrial DNA sequences. , 2011, Mitochondrion.

[18]  Heng Li Aligning sequence reads, clone sequences and assembly contigs with BWA-MEM , 2013, 1303.3997.

[19]  Shamsudheen Karuthedath Vellarikkal,et al.  mit‐o‐matic: A Comprehensive Computational Pipeline for Clinical Evaluation of Mitochondrial Variations from Next‐Generation Sequencing Datasets , 2015, Human mutation.

[20]  Günther Specht,et al.  Cloudgene: A graphical execution platform for MapReduce programs on private and public clouds , 2012, BMC Bioinformatics.

[21]  Eitan Rubin,et al.  Mitochondrial DNA heteroplasmy in diabetes and normal adults: role of acquired and inherited mutational patterns in twins. , 2012, Human molecular genetics.

[22]  Eija Korpelainen,et al.  Hadoop-BAM: directly manipulating next generation sequencing data in the cloud , 2012, Bioinform..

[23]  Pauli Rämö,et al.  Specific inhibition of diverse pathogens in human cells by synthetic microRNA-like oligonucleotides inferred from RNAi screens , 2014, Proceedings of the National Academy of Sciences.

[24]  Günther Specht,et al.  HaploGrep: a fast and reliable algorithm for automatic classification of mitochondrial DNA haplogroups , 2011, Human mutation.

[25]  M. DePristo,et al.  The Genome Analysis Toolkit: a MapReduce framework for analyzing next-generation DNA sequencing data. , 2010, Genome research.

[26]  Jian Lu,et al.  Reply to Just et al.: Mitochondrial DNA heteroplasmy could be reliably detected with massively parallel sequencing technologies , 2014, Proceedings of the National Academy of Sciences.

[27]  Heng Li,et al.  Improving SNP discovery by base alignment quality , 2011, Bioinform..

[28]  D. Wallace,et al.  Mitochondrial DNA genetics and the heteroplasmy conundrum in evolution and disease. , 2013, Cold Spring Harbor perspectives in biology.

[29]  Gianluigi Zanetti,et al.  SEAL: a distributed short read mapping and duplicate removal tool , 2011, Bioinform..

[30]  Mark Stoneking,et al.  Detecting heteroplasmy from high-throughput sequencing of complete human mitochondrial DNA genomes. , 2010, American journal of human genetics.

[31]  Jian Lu,et al.  Extensive pathogenicity of mitochondrial heteroplasmy in healthy human individuals , 2014, Proceedings of the National Academy of Sciences.