New Tools in Orthology Analysis: A Brief Review of Promising Perspectives

Nowadays defying homology relationships among sequences is essential for biological research. Within homology the analysis of orthologs sequences is of great importance for computational biology, annotation of genomes and for phylogenetic inference. Since 2007, with the increase in the number of new sequences being deposited in large biological databases, researchers have begun to analyse computerized methodologies and tools aimed at selecting the most promising ones in the prediction of orthologous groups. Literature in this field of research describes the problems that the majority of available tools show, such as those encountered in accuracy, time required for analysis (especially in light of the increasing volume of data being submitted, which require faster techniques) and the automatization of the process without requiring manual intervention. Conducting our search through BMC, Google Scholar, NCBI PubMed, and Expasy, we examined more than 600 articles pursuing the most recent techniques and tools developed to solve most the problems still existing in orthology detection. We listed the main computational tools created and developed between 2011 and 2017, taking into consideration the differences in the type of orthology analysis, outlining the main features of each tool and pointing to the problems that each one tries to address. We also observed that several tools still use as their main algorithm the BLAST “all-against-all” methodology, which entails some limitations, such as limited number of queries, computational cost, and high processing time to complete the analysis. However, new promising tools are being developed, like OrthoVenn (which uses the Venn diagram to show the relationship of ortholog groups generated by its algorithm); or proteinOrtho (which improves the accuracy of ortholog groups); or ReMark (tackling the integration of the pipeline to turn the entry process automatic); or OrthAgogue (using algorithms developed to minimize processing time); and proteinOrtho (developed for dealing with large amounts of biological data). We made a comparison among the main features of four tool and tested them using four for prokaryotic genomas. We hope that our review can be useful for researchers and will help them in selecting the most appropriate tool for their work in the field of orthology.

[1]  David E. Konerding,et al.  An Essential Guide to the Basic Local Alignment Search Tool: BLAST , 2004 .

[2]  Fabian Schreiber,et al.  Hieranoid: hierarchical orthology inference. , 2013, Journal of molecular biology.

[3]  Olivier Poch,et al.  OrthoInspector: comprehensive orthology analysis and visual exploration , 2011, BMC Bioinformatics.

[4]  Y. Tateno,et al.  Ortholog-Finder: A Tool for Constructing an Ortholog Data Set , 2016, Genome biology and evolution.

[5]  J. Oakeshott,et al.  Orthonome – a new pipeline for predicting high quality orthologue gene sets applicable to complete and draft genomes , 2017, BMC Genomics.

[6]  S. Kelly,et al.  OrthoFinder: solving fundamental biases in whole genome comparisons dramatically improves orthogroup inference accuracy , 2015, Genome Biology.

[7]  Arcady R. Mushegian,et al.  Computational methods for Gene Orthology inference , 2011, Briefings Bioinform..

[8]  Aaron R. Phillips,et al.  SPOCS: software for predicting and visualizing orthology/paralogy relationships among genomes , 2013, Bioinform..

[9]  David R. Riley,et al.  Ten years of pan-genome analyses. , 2015, Current opinion in microbiology.

[10]  S. Pongor,et al.  The quest for orthologs: finding the corresponding gene across genomes. , 2008, Trends in genetics : TIG.

[11]  Sean J. Humphrey,et al.  PhosphOrtholog: a web-based tool for cross-species mapping of orthologous protein post-translational modifications , 2015, BMC Genomics.

[12]  Manpreet Singh,et al.  Phylogenetic Method for High-Throughput Ortholog Detection , 2015 .

[13]  Robert C. Edgar,et al.  MUSCLE: multiple sequence alignment with high accuracy and high throughput. , 2004, Nucleic acids research.

[14]  Andrzej Zielezinski,et al.  ORCAN—a web‐based meta‐server for real‐time detection and functional annotation of orthologs , 2017, Bioinform..

[15]  Joel Sjöstrand,et al.  Integrating Sequence Evolution into Probabilistic Orthology Analysis. , 2015, Systematic biology.

[16]  Shanlin Liu,et al.  Orthograph: a versatile tool for mapping coding nucleotide sequences to clusters of orthologous genes , 2017, BMC Bioinformatics.

[17]  Jun Yu,et al.  A Brief Review of Software Tools for Pangenomics , 2015, Genom. Proteom. Bioinform..

[18]  Teresa M. Przytycka,et al.  COCO-CL: hierarchical clustering of homology relations based on evolutionary correlations , 2006, Bioinform..

[19]  Erich Bornberg-Bauer,et al.  Domain similarity based orthology detection , 2015, BMC Bioinformatics.

[20]  Brigitte Cambon,et al.  Eukaryote-to-eukaryote gene transfer events revealed by the genome sequence of the wine yeast Saccharomyces cerevisiae EC1118 , 2009, Proceedings of the National Academy of Sciences.

[21]  Jose M. Villaveces,et al.  morFeus: a web-based program to detect remotely conserved orthologs using symmetrical best hits and orthology network scoring , 2014, BMC Bioinformatics.

[22]  Sonja J. Prohaska,et al.  Proteinortho: Detection of (Co-)orthologs in large-scale analysis , 2011, BMC Bioinformatics.

[23]  Johannes Söding,et al.  Protein homology detection by HMM?CHMM comparison , 2005, Bioinform..

[24]  Jing Zhang,et al.  The real cost of sequencing: scaling computation to keep pace with data generation , 2016, Genome biology.

[25]  C. Stoeckert,et al.  OrthoMCL: identification of ortholog groups for eukaryotic genomes. , 2003, Genome research.

[26]  Christophe Dessimoz,et al.  Phylogenetic and Functional Assessment of Orthologs Inference Projects and Methods , 2009, PLoS Comput. Biol..

[27]  Yi Wang,et al.  OrthoVenn: a web server for genome wide comparison and annotation of orthologous clusters across multiple species , 2015, Nucleic Acids Res..

[28]  Robert C. Edgar,et al.  BIOINFORMATICS APPLICATIONS NOTE , 2001 .

[29]  Chitra Dutta,et al.  BPGA- an ultra-fast pan-genome analysis pipeline , 2016, Scientific Reports.

[30]  Ole Kristian Ekseth,et al.  orthAgogue: an agile tool for the rapid prediction of orthology relations , 2014, Bioinform..

[31]  F. Tekaia Inferring Orthologs: Open Questions and Perspectives , 2016, Genomics insights.

[32]  Ting-Wen Chen,et al.  DODO: an efficient orthologous genes assignment tool based on domain architectures. Domain based ortholog detection , 2010, BMC Bioinformatics.

[33]  Erik L. L. Sonnhammer,et al.  Kalign – an accurate and fast multiple sequence alignment algorithm , 2005, BMC Bioinformatics.

[34]  Ehsan S. Tabari,et al.  PorthoMCL: Parallel orthology prediction using MCL for the realm of massive genome availability , 2017, Big data analytics.

[35]  Derrick E. Fouts,et al.  PanOCT: automated clustering of orthologs using conserved gene neighborhood for pan-genomic analysis of bacterial strains and closely related species , 2012, Nucleic acids research.

[36]  Erik L. L. Sonnhammer,et al.  Inparanoid: a comprehensive database of eukaryotic orthologs , 2004, Nucleic Acids Res..

[37]  Kangseok Kim,et al.  ReMark: an automatic program for clustering orthologs flexibly combining a Recursive and a Markov clustering algorithms , 2011, Bioinform..