Rebooting the human mitochondrial phylogeny: an automated and scalable methodology with expert knowledge

BackgroundMitochondrial DNA is an ideal source of information to conduct evolutionary and phylogenetic studies due to its extraordinary properties and abundance. Many insights can be gained from these, including but not limited to screening genetic variation to identify potentially deleterious mutations. However, such advances require efficient solutions to very difficult computational problems, a need that is hampered by the very plenty of data that confers strength to the analysis.ResultsWe develop a systematic, automated methodology to overcome these difficulties, building from readily available, public sequence databases to high-quality alignments and phylogenetic trees. Within each stage in an autonomous workflow, outputs are carefully evaluated and outlier detection rules defined to integrate expert knowledge and automated curation, hence avoiding the manual bottleneck found in past approaches to the problem. Using these techniques, we have performed exhaustive updates to the human mitochondrial phylogeny, illustrating the power and computational scalability of our approach, and we have conducted some initial analyses on the resulting phylogenies.ConclusionsThe problem at hand demands careful definition of inputs and adequate algorithmic treatment for its solutions to be realistic and useful. It is possible to define formal rules to address the former requirement by refining inputs directly and through their combination as outputs, and the latter are also of help to ascertain the performance of chosen algorithms. Rules can exploit known or inferred properties of datasets to simplify inputs through partitioning, therefore cutting computational costs and affording work on rapidly growing, otherwise intractable datasets. Although expert guidance may be necessary to assist the learning process, low-risk results can be fully automated and have proved themselves convenient and valuable.

[1]  Roberto Blanco,et al.  Temporal Logics for Phylogenetic Analysis via Model Checking , 2010, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[2]  Marcella Attimonelli,et al.  HmtDB, a Human Mitochondrial Genomic Resource Based on Variability Studies Supporting Population Genetics and Biomedical Research , 2005, BMC Bioinformatics.

[3]  M S Waterman,et al.  Identification of common molecular subsequences. , 1981, Journal of molecular biology.

[4]  Axel Janke,et al.  Mitogenomic analyses of caniform relationships. , 2007, Molecular phylogenetics and evolution.

[5]  W. H. Day Computational complexity of inferring phylogenies from dissimilarity matrices. , 1987, Bulletin of mathematical biology.

[6]  Paola Bonizzoni,et al.  The complexity of multiple sequence alignment with SP-score that is a metric , 2001, Theor. Comput. Sci..

[7]  W. Maddison RECONSTRUCTING CHARACTER EVOLUTION ON POLYTOMOUS CLADOGRAMS , 1989, Cladistics : the international journal of the Willi Hennig Society.

[8]  Arne Röhl,et al.  Correcting for purifying selection: an improved human mitochondrial molecular clock. , 2009, American journal of human genetics.

[9]  L. Cavalli-Sforza The Human Genome Diversity Project: past, present and future , 2005, Nature Reviews Genetics.

[10]  A. González,et al.  Mitochondrial DNA structure in the Arabian Peninsula , 2008, BMC Evolutionary Biology.

[11]  David L. Wheeler,et al.  GenBank , 2015, Nucleic Acids Res..

[12]  D. Wallace A Mitochondrial Paradigm of Metabolic and Degenerative Diseases, Aging, and Cancer: A Dawn for Evolutionary Medicine , 2005, Annual review of genetics.

[13]  Hans-Jürgen Bandelt,et al.  Harvesting the fruit of the human mtDNA tree. , 2006, Trends in genetics : TIG.

[14]  Roberto Blanco,et al.  ZARAMIT: A System for the Evolutionary Study of Human Mitochondrial DNA , 2009, IWANN.

[15]  Manfred Kayser,et al.  Updated comprehensive phylogenetic tree of global human mitochondrial DNA variation , 2009, Human mutation.

[16]  Adrian W. Briggs,et al.  Targeted Retrieval and Analysis of Five Neandertal mtDNA Genomes , 2009, Science.

[17]  R. Tito,et al.  Brief communication: mitochondrial haplotype C4c confirmed as a founding genome in the Americas. , 2009, American journal of physical anthropology.

[18]  Tao Jiang,et al.  On the Complexity of Multiple Sequence Alignment , 1994, J. Comput. Biol..

[19]  Sean R. Eddy,et al.  ATV: display and manipulation of annotated phylogenetic , 2001, Bioinform..

[20]  Robert C. Edgar,et al.  MUSCLE: multiple sequence alignment with high accuracy and high throughput. , 2004, Nucleic acids research.

[21]  R Trivedi,et al.  Phylogeny and antiquity of M macrohaplogroup inferred from complete mt DNA sequence of Indian specific lineages , 2005, BMC Evolutionary Biology.

[22]  Roberto Blanco Structural parsimony: Reductions in sequence space , 2010, 2010 IEEE International Conference on Bioinformatics and Biomedicine (BIBM).

[23]  Alexandros Stamatakis,et al.  RAxML-VI-HPC: maximum likelihood-based phylogenetic analyses with thousands of taxa and mixed models , 2006, Bioinform..

[24]  Ricardo Rocha,et al.  The diversity present in 5140 human mitochondrial genomes. , 2009, American journal of human genetics.

[25]  M. Stoneking,et al.  Mitochondrial DNA and human evolution , 1987, Nature.

[26]  Dan Gusfield,et al.  Efficient algorithms for inferring evolutionary trees , 1991, Networks.

[27]  Christian M. Zmasek,et al.  phyloXML: XML for evolutionary biology and comparative genomics , 2009, BMC Bioinformatics.

[28]  Philip L. F. Johnson,et al.  Genetic history of an archaic hominin group from Denisova Cave in Siberia , 2010, Nature.

[29]  D. Turnbull,et al.  Reanalysis and revision of the Cambridge reference sequence for human mitochondrial DNA , 1999, Nature Genetics.

[30]  Marty C. Brandon,et al.  Effects of Purifying and Adaptive Selection on Regional Variation in Human mtDNA , 2004, Science.

[31]  Pierre Baldi,et al.  An enhanced MITOMAP with a global mtDNA mutational phylogeny , 2006, Nucleic Acids Res..

[32]  Alexandros Stamatakis,et al.  Phylogenetic models of rate heterogeneity: a high performance computing perspective , 2006, Proceedings 20th IEEE International Parallel & Distributed Processing Symposium.

[33]  Roberto Blanco,et al.  Scalable Phylogenetics through Input Preprocessing , 2010, IWPACBB.

[34]  K. Crandall,et al.  Selecting the best-fit model of nucleotide substitution. , 2001, Systematic biology.

[35]  Christus,et al.  A General Method Applicable to the Search for Similarities in the Amino Acid Sequence of Two Proteins , 2022 .

[36]  Qiaomei Fu,et al.  The complete mitochondrial DNA genome of an unknown hominin from southern Siberia , 2010, Nature.

[37]  Giovanni Romeo,et al.  Disruptive mitochondrial DNA mutations in complex I subunits are markers of oncocytic phenotype in thyroid tumors , 2007, Proceedings of the National Academy of Sciences.

[38]  A Salas,et al.  Pseudomitochondrial genome haunts disease studies , 2008, Journal of Medical Genetics.

[39]  A. Torroni,et al.  Mitochondrial and Y-chromosome diversity of the Tharus (Nepal): a reservoir of genetic variation , 2009, BMC Evolutionary Biology.

[40]  Mark Stoneking,et al.  High-throughput sequencing of complete human mtDNA genomes from the Philippines. , 2011, Genome research.

[41]  R. Graham,et al.  The steiner problem in phylogeny is NP-complete , 1982 .

[42]  BMC Bioinformatics , 2005 .

[43]  Jason Blue-Smith,et al.  A novel 154‐bp deletion in the human mitochondrial DNA control region in healthy individuals , 2008, Human mutation.

[44]  Rui Bi,et al.  The acquisition of an inheritable 50‐bp deletion in the human mtDNA control region does not affect the mtDNA copy number in peripheral blood cells , 2010, Human mutation.

[45]  Philip L. F. Johnson,et al.  A Complete Neandertal Mitochondrial Genome Sequence Determined by High-Throughput Sequencing , 2008, Cell.

[46]  J. Bull,et al.  An Empirical Test of Bootstrapping as a Method for Assessing Confidence in Phylogenetic Analysis , 1993 .

[47]  D. Posada jModelTest: phylogenetic model averaging. , 2008, Molecular biology and evolution.

[48]  F. Sanger,et al.  Sequence and organization of the human mitochondrial genome , 1981, Nature.

[49]  J. Felsenstein CONFIDENCE LIMITS ON PHYLOGENIES: AN APPROACH USING THE BOOTSTRAP , 1985, Evolution; international journal of organic evolution.