OMA standalone: orthology inference among public and custom genomes and transcriptomes.

Genomes and transcriptomes are now typically sequenced by individual laboratories but analyzing them often remains challenging. One essential step in many analyses lies in identifying orthologs-corresponding genes across multiple species-but this is far from trivial. The Orthologous MAtrix (OMA) database is a leading resource for identifying orthologs among publicly available, complete genomes. Here, we describe the OMA pipeline available as a standalone program for Linux and Mac. When run on a cluster, it has native support for the LSF, SGE, PBS Pro, and Slurm job schedulers and can scale up to thousands of parallel processes. Another key feature of OMA standalone is that users can combine their own data with existing public data by exporting genomes and precomputed alignments from the OMA database, which currently contains over 2100 complete genomes. We compare OMA standalone to other methods in the context of phylogenetic tree inference, by inferring a phylogeny of Lophotrochozoa, a challenging clade within the protostomes. We also discuss other potential applications of OMA standalone, including identifying gene families having undergone duplications/losses in specific clades, and identifying potential drug targets in nonmodel organisms. OMA standalone is available under the permissive open source Mozilla Public License Version 2.0.

[1]  L. Moroz,et al.  Miscues misplace sponges , 2016, Proceedings of the National Academy of Sciences.

[2]  Hirokazu Chiba,et al.  MBGD update 2013: the microbial genome database for exploring the diversity of microbial world , 2012, Nucleic Acids Res..

[3]  Anne Weigert,et al.  Platyzoan paraphyly based on phylogenomic data supports a noncoelomate ancestry of spiralia. , 2014, Molecular biology and evolution.

[4]  T. Struck,et al.  Phylogenetic position of Nemertea derived from phylogenomic data. , 2008, Molecular biology and evolution.

[5]  Miguel Pignatelli,et al.  iHam and pyHam: visualizing and processing hierarchical orthologous groups , 2018, Bioinform..

[6]  Torsten Dikow,et al.  Genomic and transcriptomic resources for assassin flies including the complete genome sequence of Proctacanthus coquilletti (Insecta: Diptera: Asilidae) and 16 representative transcriptomes , 2017, PeerJ.

[7]  E. Susko,et al.  Modeling Site Heterogeneity with Posterior Mean Site Frequency Profiles Accelerates Accurate Phylogenomic Estimation , 2018, Systematic biology.

[8]  L. Moroz,et al.  Error, signal, and the placement of Ctenophora sister to all other animals , 2015, Proceedings of the National Academy of Sciences.

[9]  P. Bork,et al.  ETE 3: Reconstruction, Analysis, and Visualization of Phylogenomic Data , 2016, Molecular biology and evolution.

[10]  Alexander C. J. Roth,et al.  Detecting non-orthology in the COGs database and other approaches grouping orthologs using genome-specific best hits , 2006, Nucleic acids research.

[11]  I-Min A. Chen,et al.  The Genomes OnLine Database (GOLD) v.4: status of genomic and metagenomic projects and their associated metadata , 2011, Nucleic Acids Res..

[12]  Jacqueline A. Keane,et al.  The genomes of four tapeworm species reveal adaptations to parasitism , 2013, Nature.

[13]  H. Philippe,et al.  Phylogenomic Insights into Animal Evolution , 2015, Current Biology.

[14]  M S Waterman,et al.  Identification of common molecular subsequences. , 1981, Journal of molecular biology.

[15]  G. Giribet,et al.  Exploring Phylogenetic Relationships within Myriapoda and the Effects of Matrix Composition and Occupancy on Phylogenomic Reconstruction , 2016, Systematic biology.

[16]  Robert A. Edwards,et al.  Quality control and preprocessing of metagenomic datasets , 2011, Bioinform..

[17]  M. Martindale,et al.  Assessing the root of bilaterian animals with scalable phylogenomic methods , 2009, Proceedings of the Royal Society B: Biological Sciences.

[18]  Christophe Dessimoz,et al.  Comparative genomics reveals contraction in olfactory receptor genes in bats , 2017, Scientific Reports.

[19]  Anushya Muruganujan,et al.  PANTHER version 11: expanded annotation data from Gene Ontology and Reactome pathways, and data analysis tool enhancements , 2016, Nucleic Acids Res..

[20]  H. Philippe,et al.  Resolving Difficult Phylogenetic Questions: Why More Sequences Are Not Enough , 2011, PLoS biology.

[21]  Christophe Dessimoz,et al.  Inferring orthology and paralogy. , 2012, Methods in molecular biology.

[22]  L. Moroz,et al.  Phylogenomics reveals deep molluscan relationships , 2011, Nature.

[23]  Daniel S. Rokhsar,et al.  A New Spiralian Phylogeny Places the Enigmatic Arrow Worms among Gnathiferans , 2019, Current Biology.

[24]  W. Wheeler,et al.  Phylogenomic interrogation of arachnida reveals systemic conflicts in phylogenetic signal. , 2014, Molecular biology and evolution.

[25]  Christophe Dessimoz,et al.  SWPS3 – fast multi-threaded vectorized Smith-Waterman for IBM Cell/B.E. and ×86/SSE2 , 2008, BMC Research Notes.

[26]  B. Nickel,et al.  Illuminating the base of the annelid tree using transcriptomics. , 2014, Molecular biology and evolution.

[27]  Stephen A. Smith,et al.  Orthology Inference in Nonmodel Organisms Using Transcriptomes and Low-Coverage Genomes: Improving Accuracy and Matrix Occupancy for Phylogenomics , 2014, Molecular biology and evolution.

[28]  Marta Riutort,et al.  Molecular phylogeny of the Platyhelminthes , 2004 .

[29]  Salvador Capella-Gutiérrez,et al.  PhylomeDB v4: zooming into the plurality of evolutionary histories of a genome , 2013, Nucleic Acids Res..

[30]  Kimmen Sjölander,et al.  The PhyloFacts FAT-CAT web server: ortholog identification and function prediction using fast approximate tree classification , 2013, Nucleic Acids Res..

[31]  Ioannis Xenarios,et al.  Conceptual framework and pilot study to benchmark phylogenomic databases based on reference gene trees , 2011, Briefings Bioinform..

[32]  Christophe Dessimoz,et al.  Resolving the Ortholog Conjecture: Orthologs Tend to Be Weakly, but Significantly, More Similar in Function than Paralogs , 2012, PLoS Comput. Biol..

[33]  Adrian M. Altenhoff,et al.  Standardized benchmarking in the quest for orthologs , 2016, Nature Methods.

[34]  Zhengwei Zhu,et al.  CD-HIT: accelerated for clustering the next-generation sequencing data , 2012, Bioinform..

[35]  Gaston H. Gonnet,et al.  Inferring Hierarchical Orthologous Groups from Orthologous Gene Pairs , 2013, PloS one.

[36]  J. Rink,et al.  A software tool ‘CroCo’ detects pervasive cross-species contamination in next generation sequencing data , 2018, BMC Biology.

[37]  Gonzalo Giribet,et al.  Current Understanding of Ecdysozoa and its Internal Phylogenetic Relationships. , 2017, Integrative and comparative biology.

[38]  A. von Haeseler,et al.  IQ-TREE: A Fast and Effective Stochastic Algorithm for Estimating Maximum-Likelihood Phylogenies , 2014, Molecular biology and evolution.

[39]  Tomislav Domazet-Loso,et al.  A phylostratigraphy approach to uncover the genomic history of major adaptations in metazoan lineages. , 2007, Trends in genetics : TIG.

[40]  Gonzalo Giribet,et al.  Unnoticed in the tropics: phylogenomic resolution of the poorly known arachnid order Ricinulei (Arachnida) , 2015, Royal Society Open Science.

[41]  Evgeny M. Zdobnov,et al.  OrthoDB v9.1: cataloging evolutionary and functional annotations for animal, fungal, plant, archaeal, bacterial and viral orthologs , 2016, Nucleic Acids Res..

[42]  Erik L. L. Sonnhammer,et al.  InParanoid 8: orthology analysis between 273 proteomes, mostly eukaryotic , 2014, Nucleic Acids Res..

[43]  Gaston H. Gonnet,et al.  The OMA orthology database in 2015: function predictions, better plant support, synteny view and other improvements , 2014, Nucleic Acids Res..

[44]  D. Weese,et al.  Phylogenomics of Lophotrochozoa with Consideration of Systematic Error , 2016, Systematic biology.

[45]  Sen Song,et al.  Resolving conflict in eutherian mammal phylogeny using phylogenomics and the multispecies coalescent model , 2012, Proceedings of the National Academy of Sciences.

[46]  Jesualdo Tomás Fernández-Breis,et al.  Gearing up to handle the mosaic nature of life in the quest for orthologs , 2017, Bioinform..

[47]  C. Dessimoz,et al.  Bidirectional Best Hits Miss Many Orthologs in Duplication-Rich Clades such as Plants and Animals , 2013, Genome biology and evolution.

[48]  Gaston H. Gonnet,et al.  OMA, A Comprehensive, Automated Project for the Identification of Orthologs from Complete Genome Data: Introduction and First Achievements , 2005, Comparative Genomics.

[49]  Gaston H. Gonnet,et al.  Darwin v. 2.0: an interpreted computer language for the biosciences , 2000, Bioinform..

[50]  Ingo Ebersberger,et al.  HaMStR: Profile hidden markov model based search for orthologs in ESTs , 2009, BMC Evolutionary Biology.

[51]  Gergely J. Szöllősi,et al.  Integrative modeling of gene and genome evolution roots the archaeal tree of life , 2017, Proceedings of the National Academy of Sciences.

[52]  Gonzalo Giribet,et al.  Higher-level metazoan relationships: recent progress and remaining questions , 2011, Organisms Diversity & Evolution.

[53]  Gaston H. Gonnet,et al.  Orthologous Matrix (OMA) algorithm 2.0: more robust to asymmetric evolutionary rates and more scalable hierarchical orthologous group inference , 2017, Bioinform..

[54]  R. Copley,et al.  Acoelomorph flatworms are deuterostomes related to Xenoturbella , 2011, Nature.

[55]  Gaston H. Gonnet,et al.  Algorithm of OMA for large-scale orthology inference , 2008, BMC bioinformatics.

[56]  Javier Herrero,et al.  Toward community standards in the quest for orthologs , 2012, Bioinform..

[57]  Gonzalo Giribet,et al.  Evaluating topological conflict in centipede phylogeny using transcriptomic data sets. , 2014, Molecular biology and evolution.

[58]  Robert C. Edgar,et al.  MUSCLE: multiple sequence alignment with high accuracy and high throughput. , 2004, Nucleic acids research.

[59]  Gonzalo Giribet,et al.  Nuclear genomic signals of the ‘microturbellarian’ roots of platyhelminth evolutionary innovation , 2015, eLife.

[60]  Davide Heller,et al.  eggNOG 4.5: a hierarchical orthology framework with improved functional annotations for eukaryotic, prokaryotic and viral sequences , 2015, Nucleic Acids Res..

[61]  Thomas K. F. Wong,et al.  ModelFinder: Fast Model Selection for Accurate Phylogenetic Estimates , 2017, Nature Methods.

[62]  Maria Jesus Martin,et al.  Big data and other challenges in the quest for orthologs , 2014, Bioinform..

[63]  Feng Chen,et al.  OrthoMCL-DB: querying a comprehensive multi-species collection of ortholog groups , 2005, Nucleic Acids Res..

[64]  Kevin M. Kocot,et al.  On 20 years of Lophotrochozoa , 2015, Organisms Diversity & Evolution.

[65]  D. Eisenberg,et al.  Assigning protein functions by comparative genome analysis: protein phylogenetic profiles. , 1999, Proceedings of the National Academy of Sciences of the United States of America.

[66]  Christophe Dessimoz,et al.  Phylogenetic and Functional Assessment of Orthologs Inference Projects and Methods , 2009, PLoS Comput. Biol..

[67]  Evgeny M. Zdobnov,et al.  BUSCO: assessing genome assembly and annotation completeness with single-copy orthologs , 2015, Bioinform..

[68]  Chao Xie,et al.  Fast and sensitive protein alignment using DIAMOND , 2014, Nature Methods.

[69]  H. Philippe,et al.  Impact of missing data on phylogenies inferred from empirical phylogenomic data sets. , 2013, Molecular biology and evolution.

[70]  H. Philippe,et al.  Reply to Halanych et al.: Ctenophore misplacement is corroborated by independent datasets , 2016, Proceedings of the National Academy of Sciences.

[71]  C. Stoeckert,et al.  OrthoMCL: identification of ortholog groups for eukaryotic genomes. , 2003, Genome research.

[72]  Toni Gabaldón,et al.  trimAl: a tool for automated alignment trimming in large-scale phylogenetic analyses , 2009, Bioinform..

[73]  Ingi Agnarsson,et al.  Spider phylogenomics: untangling the Spider Tree of Life , 2016, PeerJ.

[74]  David Q. Matus,et al.  Broad phylogenomic sampling improves resolution of the animal tree of life , 2008, Nature.

[75]  S. Kelly,et al.  OrthoFinder: solving fundamental biases in whole genome comparisons dramatically improves orthogroup inference accuracy , 2015, Genome Biology.

[76]  Steven A. Benner,et al.  The Natural History of Class I Primate Alcohol Dehydrogenases Includes Gene Duplication, Gene Loss, and Gene Conversion , 2012, PloS one.

[77]  G. Giribet,et al.  Animal Phylogeny and Its Evolutionary Implications , 2014 .

[78]  Astrid Gall,et al.  Ensembl 2018 , 2017, Nucleic Acids Res..

[79]  Gaston H. Gonnet,et al.  The OMA orthology database in 2018: retrieving evolutionary relationships among all domains of life through richer web and programmatic interfaces , 2017, Nucleic Acids Res..

[80]  Daniel Stubbs,et al.  PhyloBayes MPI: phylogenetic reconstruction with infinite mixtures of profiles in a parallel environment. , 2013, Systematic biology.

[81]  Colin N. Dewey,et al.  De novo transcript sequence reconstruction from RNA-seq using the Trinity platform for reference generation and analysis , 2013, Nature Protocols.

[82]  Sean Ekins,et al.  In silico repositioning of approved drugs for rare and neglected diseases. , 2011, Drug discovery today.

[83]  N. Skunca,et al.  A Transcriptomic-Phylogenomic Analysis of the Evolutionary Relationships of Flatworms , 2015, Current Biology.

[84]  Olivier Poch,et al.  OrthoInspector: comprehensive orthology analysis and visual exploration , 2011, BMC Bioinformatics.

[85]  Albert J. Vilella,et al.  EnsemblCompara GeneTrees: Complete, duplication-aware phylogenetic trees in vertebrates. , 2009, Genome research.

[86]  John D. Chan,et al.  A Novel Biological Activity of Praziquantel Requiring Voltage-Operated Ca2+ Channel β Subunits: Subversion of Flatworm Regenerative Polarity , 2009, PLoS neglected tropical diseases.

[87]  Johannes Söding,et al.  MMseqs2: sensitive protein sequence searching for the analysis of massive data sets , 2017, bioRxiv.

[88]  Christophe Dessimoz,et al.  Speeding up all-against-all protein comparisons while maintaining sensitivity by considering subsequence-level homology , 2014, PeerJ.

[89]  H. Philippe,et al.  Genomic data do not support comb jellies as the sister group to all other animals , 2015, Proceedings of the National Academy of Sciences.

[90]  Rosa Fernández,et al.  Phylogenomic resolution of scorpions reveals multilevel discordance with morphological phylogenetic signal , 2015, Proceedings of the Royal Society B: Biological Sciences.

[91]  A. von Haeseler,et al.  UFBoot2: Improving the Ultrafast Bootstrap Approximation , 2017, bioRxiv.

[92]  Fabian Schreiber,et al.  Letter to the Editor: SeqXML and OrthoXML: standards for sequence and orthology information , 2011, Briefings Bioinform..

[93]  E. Koonin,et al.  Functional and evolutionary implications of gene orthology , 2013, Nature Reviews Genetics.