morFeus: a web-based program to detect remotely conserved orthologs using symmetrical best hits and orthology network scoring

BackgroundSearching the orthologs of a given protein or DNA sequence is one of the most important and most commonly used Bioinformatics methods in Biology. Programs like BLAST or the orthology search engine Inparanoid can be used to find orthologs when the similarity between two sequences is sufficiently high. They however fail when the level of conservation is low. The detection of remotely conserved proteins oftentimes involves sophisticated manual intervention that is difficult to automate.ResultsHere, we introduce morFeus, a search program to find remotely conserved orthologs. Based on relaxed sequence similarity searches, morFeus selects sequences based on the similarity of their alignments to the query, tests for orthology by iterative reciprocal BLAST searches and calculates a network score for the resulting network of orthologs that is a measure of orthology independent of the E-value. Detecting remotely conserved orthologs of a protein using morFeus thus requires no manual intervention. We demonstrate the performance of morFeus by comparing it to state-of-the-art orthology resources and methods. We provide an example of remotely conserved orthologs, which were experimentally shown to be functionally equivalent in the respective organisms and therefore meet the criteria of the orthology-function conjecture.ConclusionsBased on our results, we conclude that morFeus is a powerful and specific search method for detecting remotely conserved orthologs. morFeus is freely available at http://bio.biochem.mpg.de/morfeus/. Its source code is available from Sourceforge.net (https://sourceforge.net/p/morfeus/).

[1]  P. Bonacich Factoring and weighting approaches to status scores and clique identification , 1972 .

[2]  Bianca Habermann,et al.  Swm1/Apc13 Is an Evolutionarily Conserved Subunit of the Anaphase-Promoting Complex Stabilizing the Association of Cdc16 and Cdc27 , 2004, Molecular and Cellular Biology.

[3]  Kimmen Sjölander,et al.  The PhyloFacts FAT-CAT web server: ortholog identification and function prediction using fast approximate tree classification , 2013, Nucleic Acids Res..

[4]  Kimmen Sjölander,et al.  Berkeley PHOG: PhyloFacts orthology group prediction web server , 2009, Nucleic Acids Res..

[5]  Salvador Capella-Gutiérrez,et al.  PhylomeDB v3.0: an expanding repository of genome-wide collections of trees, alignments and phylogeny-based orthology and paralogy predictions , 2010, Nucleic Acids Res..

[6]  Trey Ideker,et al.  Cytoscape 2.8: new features for data integration and network visualization , 2010, Bioinform..

[7]  Erik L. L. Sonnhammer,et al.  Inparanoid: a comprehensive database of eukaryotic orthologs , 2004, Nucleic Acids Res..

[8]  E. Koonin,et al.  Functional and evolutionary implications of gene orthology , 2013, Nature Reviews Genetics.

[9]  A. Biegert,et al.  HHblits: lightning-fast iterative protein sequence searching by HMM-HMM alignment , 2011, Nature Methods.

[10]  D. Haussler,et al.  Sequence comparisons using multiple sequences detect three times as many remote homologues as pairwise methods. , 1998, Journal of molecular biology.

[11]  C. Stoeckert,et al.  OrthoMCL: identification of ortholog groups for eukaryotic genomes. , 2003, Genome research.

[12]  David Haussler,et al.  The UCSC genome browser database: update 2007 , 2006, Nucleic Acids Res..

[13]  Maricel G Kann,et al.  Performance evaluation of a new algorithm for the detection of remote homologs with sequence comparison , 2002, Proteins.

[14]  Mary Goldman,et al.  The UCSC Genome Browser database: update 2011 , 2010, Nucleic Acids Res..

[15]  Darren A. Natale,et al.  The COG database: an updated version includes eukaryotes , 2003, BMC Bioinformatics.

[16]  Albert J. Vilella,et al.  EnsemblCompara GeneTrees: Complete, duplication-aware phylogenetic trees in vertebrates. , 2009, Genome research.

[17]  Erik L. L. Sonnhammer,et al.  InParanoid 7: new algorithms and tools for eukaryotic orthology analysis , 2009, Nucleic Acids Res..

[18]  D. P. Wall,et al.  Detecting putative orthologs , 2003, Bioinform..

[19]  Gautier Koscielny,et al.  Ensembl Genomes: an integrative resource for genome-scale data from non-vertebrate species , 2011, Nucleic Acids Res..

[20]  M Kann,et al.  Optimization of a new score function for the detection of remote homologs , 2000, Proteins.

[21]  Thomas D. Cuypers,et al.  Iterative orthology prediction uncovers new mitochondrial proteins and identifies C12orf62 as the human ortholog of COX14, a protein involved in the assembly of cytochrome c oxidase , 2012, Genome Biology.

[22]  Sean R. Eddy,et al.  Profile hidden Markov models , 1998, Bioinform..

[23]  S. Eddy Hidden Markov models. , 1996, Current opinion in structural biology.

[24]  M. Ruggero,et al.  Similarity of Traveling-Wave Delays in the Hearing Organs of Humans and Other Tetrapods , 2007, Journal for the Association for Research in Otolaryngology.

[25]  W. Fitch Distinguishing homologous from analogous proteins. , 1970, Systematic zoology.

[26]  Christian von Mering,et al.  eggNOG: automated construction and annotation of orthologous groups of genes , 2007, Nucleic Acids Res..

[27]  M. Sternberg,et al.  Benchmarking PSI-BLAST in genome annotation. , 1999, Journal of molecular biology.

[28]  Aric Hagberg,et al.  Exploring Network Structure, Dynamics, and Function using NetworkX , 2008, Proceedings of the Python in Science Conference.

[29]  Johannes Söding,et al.  HHsenser: exhaustive transitive profile search using HMM–HMM comparison , 2006, Nucleic Acids Res..

[30]  Frances M. G. Pearl,et al.  The CATH extended protein‐family database: Providing structural annotations for genome sequences , 2002, Protein science : a publication of the Protein Society.