Concept for estimating mitochondrial DNA haplogroups using a maximum likelihood approach (EMMA)

The assignment of haplogroups to mitochondrial DNA haplotypes contributes substantial value for quality control, not only in forensic genetics but also in population and medical genetics. The availability of Phylotree, a widely accepted phylogenetic tree of human mitochondrial DNA lineages, led to the development of several (semi-)automated software solutions for haplogrouping. However, currently existing haplogrouping tools only make use of haplogroup-defining mutations, whereas private mutations (beyond the haplogroup level) can be additionally informative allowing for enhanced haplogroup assignment. This is especially relevant in the case of (partial) control region sequences, which are mainly used in forensics. The present study makes three major contributions toward a more reliable, semi-automated estimation of mitochondrial haplogroups. First, a quality-controlled database consisting of 14,990 full mtGenomes downloaded from GenBank was compiled. Together with Phylotree, these mtGenomes serve as a reference database for haplogroup estimates. Second, the concept of fluctuation rates, i.e. a maximum likelihood estimation of the stability of mutations based on 19,171 full control region haplotypes for which raw lane data is available, is presented. Finally, an algorithm for estimating the haplogroup of an mtDNA sequence based on the combined database of full mtGenomes and Phylotree, which also incorporates the empirically determined fluctuation rates, is brought forward. On the basis of examples from the literature and EMPOP, the algorithm is not only validated, but both the strength of this approach and its utility for quality control of mitochondrial haplotypes is also demonstrated.

[1]  Yong-Gang Yao,et al.  MitoTool: a web server for the analysis and retrieval of human mitochondrial DNA sequence variations. , 2011, Mitochondrion.

[2]  Yong-Gang Yao,et al.  An update to MitoTool: using a new scoring system for faster mtDNA haplogroup determination. , 2013, Mitochondrion.

[3]  Q. Kong,et al.  Estimation of Mutation Rates and Coalescence Times: Some Caveats , 2006 .

[4]  Q. Kong,et al.  Low "penetrance" of phylogenetic knowledge in mitochondrial disease studies. , 2005, Biochemical and biophysical research communications.

[5]  R Trivedi,et al.  Phylogeny and antiquity of M macrohaplogroup inferred from complete mt DNA sequence of Indian specific lineages , 2005, BMC Evolutionary Biology.

[6]  H. Bandelt,et al.  Amerindian mitochondrial DNA haplogroups predominate in the population of Argentina: towards a first nationwide forensic mitochondrial DNA sequence database , 2010, International Journal of Legal Medicine.

[7]  Günther Specht,et al.  HaploGrep: a fast and reliable algorithm for automatic classification of mitochondrial DNA haplogroups , 2011, Human mutation.

[8]  Hans-Jürgen Bandelt,et al.  mtDNA data mining in GenBank needs surveying. , 2009, American journal of human genetics.

[9]  D. Turnbull,et al.  Reanalysis and revision of the Cambridge reference sequence for human mitochondrial DNA , 1999, Nature Genetics.

[10]  Walther Parson,et al.  SAM: String-based sequence search algorithm for mitochondrial DNA database queries , 2011, Forensic science international. Genetics.

[11]  A. Amorim,et al.  mtDNAoffice: a software to assign human mtDNA macro haplogroups through automated analysis of the protein coding region. , 2012, Mitochondrion.

[12]  A. Rodriguez-Larralde,et al.  Sequence variation of mitochondrial DNA control region in North Central Venezuela. , 2012, Forensic science international. Genetics.

[13]  M. Holland,et al.  A cautionary note on switching mitochondrial DNA reference sequences in forensic genetics. , 2012, Forensic science international. Genetics.

[14]  M. Stoneking,et al.  High-throughput sequencing of complete human mtDNA genomes from the Caucasus and West Asia: high diversity and demographic inferences , 2011, European Journal of Human Genetics.

[15]  Hans-Jürgen Bandelt,et al.  A practical guide to mitochondrial DNA error prevention in clinical, forensic, and population genetics. , 2005, Biochemical and biophysical research communications.

[16]  Stuart Anderson,et al.  Trends and Developments , 2004 .

[17]  Marianne Schürenkamp,et al.  The GEDNAP blind trial concept part II. Trends and developments , 2004, International Journal of Legal Medicine.

[18]  F. Sanger,et al.  Sequence and organization of the human mitochondrial genome , 1981, Nature.

[19]  B. Malyarchuk,et al.  Mitochondrial DNA phylogeny in Eastern and Western Slavs. , 2008, Molecular biology and evolution.

[20]  R Core Team,et al.  R: A language and environment for statistical computing. , 2014 .

[21]  H. Bandelt,et al.  Human Mitochondrial DNA and the Evolution of Homo sapiens , 2006 .

[22]  Manfred Kayser,et al.  Updated comprehensive phylogenetic tree of global human mitochondrial DNA variation , 2009, Human mutation.

[23]  Giovanni Romeo,et al.  Disruptive mitochondrial DNA mutations in complex I subunits are markers of oncocytic phenotype in thyroid tumors , 2007, Proceedings of the National Academy of Sciences.

[24]  Arne Röhl,et al.  Correcting for purifying selection: an improved human mitochondrial molecular clock. , 2009, American journal of human genetics.

[25]  Mark Stoneking,et al.  High-throughput sequencing of complete human mtDNA genomes from the Philippines. , 2011, Genome research.

[26]  Francesco Rubino,et al.  HmtDB, a genomic resource for mitochondrion-based human variability studies , 2011, Nucleic Acids Res..

[27]  R. Villems,et al.  Mitochondrial DNA signals of late glacial recolonization of Europe from near eastern refugia. , 2012, American journal of human genetics.

[28]  D. Turnbull,et al.  Reduced-median-network analysis of complete mitochondrial DNA coding-region sequences for the major African, Asian, and European haplogroups. , 2002, American journal of human genetics.

[29]  S. Oppenheimer,et al.  Ancient voyaging and Polynesian origins. , 2011, American journal of human genetics.

[30]  Hans-Jürgen Bandelt,et al.  Haplogrouping mitochondrial DNA sequences in Legal Medicine/Forensic Genetics , 2012, International Journal of Legal Medicine.

[31]  W. Parson,et al.  Consistent treatment of length variants in the human mtDNA control region: a reappraisal , 2006, International Journal of Legal Medicine.

[32]  Holly M. Mortensen,et al.  Whole-mtDNA genome sequence analysis of ancient African lineages. , 2007, Molecular biology and evolution.

[33]  Sung-Bae Cho,et al.  mtDNAmanager: a Web-based tool for the management and quality analysis of mitochondrial DNA control-region sequences , 2008, BMC Bioinformatics.

[34]  Saharon Rosset,et al.  A "Copernican" reassessment of the human mitochondrial DNA tree from its root. , 2012, American journal of human genetics.

[35]  Q. Kong,et al.  The dazzling array of basal branches in the mtDNA macrohaplogroup M from India as inferred from complete genomes. , 2006, Molecular biology and evolution.