Extraction and annotation of human mitochondrial genomes from 1000 Genomes Whole Exome Sequencing data

BackgroundWhole Exome Sequencing (WES) is one of the most used and cost-effective next generation technologies that allows sequencing of all nuclear exons. Off-target regions may be captured if they present high sequence similarity with baits. Bioinformatics tools have been optimized to retrieve a large amount of WES off-target mitochondrial DNA (mtDNA), by exploiting the aspecificity of probes, partially overlapping to Nuclear mitochondrial Sequences (NumtS). The 1000 Genomes project represents one of the widest resources to extract mtDNA sequences from WES data, considering the large effort the scientific community is undertaking to reconstruct human population history using mtDNA as marker, and the involvement of mtDNA in pathology.ResultsA previously published pipeline aimed at assembling mitochondrial genomes from off-target WES reads and further improved to detect insertions and deletions (indels) and heteroplasmy in a dataset of 1242 samples from the 1000 Genomes project, enabled to obtain a nearly complete mitochondrial genome from 943 samples (76% analyzed exomes). The robustness of our computational strategy was highlighted by the reduction of reads amount recognized as mitochondrial in the original annotation produced by the Consortium, due to NumtS filtering.An accurate survey was carried out on 1242 individuals. 215 indels, mostly heteroplasmic, and 3407 single base variants were mapped. A homogeneous mismatches distribution was observed along the whole mitochondrial genome, while a lower frequency of indels was found within protein-coding regions, where frameshift mutations may be deleterious. The majority of indels and mismatches found were not previously annotated in mitochondrial databases since conventional sequencing methods were limited to homoplasmy or quasi-homoplasmy detection. Intriguingly, upon filtering out non haplogroup-defining variants, we detected a widespread population occurrence of rare events predicted to be damaging. Eventually, samples were stratified into blood- and lymphoblastoid-derived to detect possibly different trends of mutability in the two datasets, an analysis which did not yield significant discordances.ConclusionsTo the best of our knowledge, this is likely the most extended population-scale mitochondrial genotyping in humans enriched with the estimation of heteroplasmies.

[1]  Predrag Radivojac,et al.  Automated inference of molecular mechanisms of disease from amino acid substitutions , 2009, Bioinform..

[2]  Gonçalo R. Abecasis,et al.  The Sequence Alignment/Map format and SAMtools , 2009, Bioinform..

[3]  C. Tyler-Smith,et al.  Deleterious- and disease-allele prevalence in healthy individuals: insights from current predictions, mutation databases, and population-scale resequencing. , 2012, American journal of human genetics.

[4]  E. Schon,et al.  Two direct repeats cause most human mtDNA deletions. , 2004, Trends in genetics : TIG.

[5]  Ramesh Hariharan,et al.  Next-Generation Sequencing of Human Mitochondrial Reference Genomes Uncovers High Heteroplasmy Frequency , 2012, PLoS Comput. Biol..

[6]  Hajime Sato,et al.  Mitochondrial DNA heteroplasmy among hairs from single individuals. , 2004, Journal of forensic sciences.

[7]  J. Shendure,et al.  Exome sequencing as a tool for Mendelian disease gene discovery , 2011, Nature Reviews Genetics.

[8]  K. Roeder,et al.  Whole exome sequencing reveals minimal differences between cell line and whole blood derived DNA. , 2013, Genomics.

[9]  Laura C. Greaves,et al.  Mitochondrial DNA mutations in human disease , 2006, IUBMB life.

[10]  S. Pääbo,et al.  Heteroplasmy in the control region of human mitochondrial DNA. , 1995, Genome research.

[11]  M. DePristo,et al.  Variation in genome-wide mutation rates within and between human families , 2011, Nature Genetics.

[12]  P. Bork,et al.  A method and server for predicting damaging missense mutations , 2010, Nature Methods.

[13]  Paul Flicek,et al.  The functional spectrum of low-frequency coding variation , 2011, Genome Biology.

[14]  Kenny Q. Ye,et al.  An integrated map of genetic variation from 1,092 human genomes , 2012, Nature.

[15]  S. Dimauro,et al.  Recombination via flanking direct repeats is a major cause of large-scale deletions of human mitochondrial DNA. , 1990, Nucleic acids research.

[16]  E. Bonora,et al.  An inherited mitochondrial DNA disruptive mutation shifts to homoplasmy in oncocytic tumor cells , 2009, Human mutation.

[17]  David C. Samuels,et al.  Universal heteroplasmy of human mitochondrial DNA , 2012, Human molecular genetics.

[18]  D. Wallace Mitochondrial diseases in man and mouse. , 1999, Science.

[19]  Graziano Pesole,et al.  The neglected genome , 2012, EMBO reports.

[20]  W. Parson,et al.  Consistent treatment of length variants in the human mtDNA control region: a reappraisal , 2006, International Journal of Legal Medicine.

[21]  Marcella Attimonelli,et al.  The reference human nuclear mitochondrial sequences compilation validated and implemented on the UCSC genome browser , 2011, BMC Genomics.

[22]  Sha Tang,et al.  Characterization of mitochondrial DNA heteroplasmy using a parallel sequencing system. , 2010, BioTechniques.

[23]  Saharon Rosset,et al.  A "Copernican" reassessment of the human mitochondrial DNA tree from its root. , 2012, American journal of human genetics.

[24]  Asan,et al.  Sequencing of 50 Human Exomes Reveals Adaptation to High Altitude , 2010, Science.

[25]  Francesco Rubino,et al.  HmtDB, a genomic resource for mitochondrion-based human variability studies , 2011, Nucleic Acids Res..

[26]  Jean-Pierre Mazat,et al.  Mitochondrial threshold effects. , 2003, The Biochemical journal.

[27]  Richard Durbin,et al.  Sequence analysis Fast and accurate short read alignment with Burrows – Wheeler transform , 2009 .

[28]  S. Mitalipov,et al.  Rapid mitochondrial DNA segregation in primate preimplantation embryos precedes somatic and germline bottleneck. , 2012, Cell reports.

[29]  Giovanni Romeo,et al.  A mutation threshold distinguishes the antitumorigenic effects of the mitochondrial gene MTND1, an oncojanus function. , 2011, Cancer research.

[30]  Manfred Kayser,et al.  Updated comprehensive phylogenetic tree of global human mitochondrial DNA variation , 2009, Human mutation.

[31]  G. Pesole,et al.  Mitochondrial genomes gleaned from human whole-exome sequencing , 2012, Nature Methods.

[32]  Predrag Radivojac,et al.  Comparing phylogeny and the predicted pathogenicity of protein variations reveals equal purifying selection across the global human mtDNA diversity. , 2011, American journal of human genetics.

[33]  Stephen L Hauser,et al.  In depth comparison of an individual’s DNA and its lymphoblastoid cell line using whole genome sequencing , 2012, BMC Genomics.

[34]  Xiaowu Gai,et al.  Mitochondrial disease genetic diagnostics: optimized whole-exome analysis for all MitoCarta nuclear genes and the mitochondrial genome. , 2012, Discovery medicine.

[35]  E. Capriotti,et al.  Functional annotations improve the predictive score of human disease‐related mutations in proteins , 2009, Human mutation.

[36]  Eloisa Arbustini,et al.  Mitochondrial DNA Variant Discovery and Evaluation in Human Cardiomyopathies through Next-Generation Sequencing , 2010, PloS one.

[37]  Emily H Turner,et al.  Targeted Capture and Massively Parallel Sequencing of Twelve Human Exomes , 2009, Nature.

[38]  P. Chinnery,et al.  Mitochondrial genetics , 2013, British medical bulletin.

[39]  Mark Stoneking,et al.  Detecting heteroplasmy from high-throughput sequencing of complete human mitochondrial DNA genomes. , 2010, American journal of human genetics.

[40]  P. Fortina,et al.  Whole-exome sequencing of DNA from peripheral blood mononuclear cells (PBMC) and EBV-transformed lymphocytes from the same donor , 2011, BMC Genomics.

[41]  R. Millikan,et al.  Mitochondrial DNA G10398A polymorphism and invasive breast cancer in African-American women. , 2005, Cancer research.

[42]  D. Dressman,et al.  Heteroplasmic mitochondrial DNA mutations in normal and tumor cells , 2010, Nature.

[43]  Giovanni Romeo,et al.  Searching for a needle in the haystack: comparing six methods to evaluate heteroplasmy in difficult sequence context. , 2012, Biotechnology advances.

[44]  Pierre Baldi,et al.  An enhanced MITOMAP with a global mtDNA mutational phylogeny , 2006, Nucleic Acids Res..

[45]  R. Durbin,et al.  Sequence analysis Fast and accurate short read alignment with Burrows – Wheeler transform , 2009 .