Broccoli: combining phylogenetic and network analyses for orthology assignment

Orthology assignment is a key step of comparative genomic studies, for which many bioinformatic tools have been developed. However, all gene clustering pipelines are based on the analysis of protein distances, which are subject to many artefacts. In this paper we introduce Broccoli, a user-friendly pipeline designed to infer, with high precision, orthologous groups and pairs of proteins using a phylogeny-based approach. Briefly, Broccoli performs ultra-fast phylogenetic analyses on most proteins and builds a network of orthologous relationships. Orthologous groups are then identified from the network using a parameter-free machine learning algorithm. Broccoli is also able to detect chimeric proteins resulting from gene-fusion events and to assign these proteins to the corresponding orthologous groups. Tested on two benchmark datasets, Broccoli outperforms current orthology pipelines. In addition, Broccoli is scalable, with runtimes similar to those of recent distance-based pipelines. Given its high level of performance and efficiency, this new pipeline represents a suitable choice for comparative genomic studies. Broccoli is freely available at https://github.com/rderelle/Broccoli.

[1]  S. Kelly,et al.  OrthoFinder: phylogenetic orthology inference for comparative genomics , 2019, Genome Biology.

[2]  W. Maddison Gene Trees in Species Trees , 1997 .

[3]  Bin Ma,et al.  From Gene Trees to Species Trees , 2000, SIAM J. Comput..

[4]  Chao Xie,et al.  Fast and sensitive protein alignment using DIAMOND , 2014, Nature Methods.

[5]  Wataru Iwasaki,et al.  SonicParanoid: fast, accurate and easy orthology inference , 2018, Bioinform..

[6]  T. Cavalier-smith Protist phylogeny and the high-level classification of Protozoa , 2003 .

[7]  Christine M. Malcom,et al.  Accelerated Evolution of Nervous System Genes in the Origin of Homo sapiens , 2004, Cell.

[8]  C. Stoeckert,et al.  OrthoMCL: identification of ortholog groups for eukaryotic genomes. , 2003, Genome research.

[9]  F. Kondrashov,et al.  Long-Term Asymmetrical Acceleration of Protein Evolution after Gene Duplication , 2014, Genome biology and evolution.

[10]  Jesualdo Tomás Fernández-Breis,et al.  Gearing up to handle the mosaic nature of life in the quest for orthologs , 2017, Bioinform..

[11]  Paramvir S. Dehal,et al.  FastTree 2 – Approximately Maximum-Likelihood Trees for Large Alignments , 2010, PloS one.

[12]  G. Petsko My worries are no longer behind me , 2007, Genome Biology.

[13]  François-Joseph Lapointe,et al.  CompositeSearch: A Generalized Network Approach for Composite Gene Families Detection , 2017, Molecular biology and evolution.

[14]  S. Kelly,et al.  OrthoFinder: solving fundamental biases in whole genome comparisons dramatically improves orthogroup inference accuracy , 2015, Genome Biology.

[15]  J. Gatesy,et al.  The supermatrix approach to systematics. , 2007, Trends in ecology & evolution.

[16]  Johannes Söding,et al.  MMseqs2: sensitive protein sequence searching for the analysis of massive data sets , 2017, bioRxiv.

[17]  Réka Albert,et al.  Near linear time algorithm to detect community structures in large-scale networks. , 2007, Physical review. E, Statistical, nonlinear, and soft matter physics.

[18]  Salvador Capella-Gutiérrez,et al.  PhylomeDB v4: zooming into the plurality of evolutionary histories of a genome , 2013, Nucleic Acids Res..

[19]  Gaston H. Gonnet,et al.  Algorithm of OMA for large-scale orthology inference , 2008, BMC Bioinformatics.

[20]  Tadashi Imanishi,et al.  A genome-wide survey of changes in protein evolutionary rates across four closely related species of Saccharomyces sensu stricto group , 2007, BMC Evolutionary Biology.

[21]  Jinling Huang,et al.  Horizontal gene transfer: building the web of life , 2015, Nature Reviews Genetics.

[22]  P. Bork,et al.  Measuring genome evolution. , 1998, Proceedings of the National Academy of Sciences of the United States of America.

[23]  E. Koonin Orthologs, Paralogs, and Evolutionary Genomics 1 , 2005 .

[24]  Fabian Schreiber,et al.  Hieranoid: hierarchical orthology inference. , 2013, Journal of molecular biology.

[25]  Thomas A. Richards,et al.  Evolutionary Origins of the Eukaryotic Shikimate Pathway: Gene Fusions, Horizontal Gene Transfer, and Endosymbiotic Replacements , 2006, Eukaryotic Cell.

[26]  E. Koonin Orthologs, paralogs, and evolutionary genomics. , 2005, Annual review of genetics.

[27]  Silvio C. E. Tosatto,et al.  The Pfam protein families database in 2019 , 2018, Nucleic Acids Res..

[28]  Albert J. Vilella,et al.  EnsemblCompara GeneTrees: Complete, duplication-aware phylogenetic trees in vertebrates. , 2009, Genome research.

[29]  Erik L. L. Sonnhammer,et al.  InParanoid 8: orthology analysis between 273 proteomes, mostly eukaryotic , 2014, Nucleic Acids Res..

[30]  F. Kondrashov Gene duplication as a mechanism of genomic adaptation to a changing environment , 2012, Proceedings of the Royal Society B: Biological Sciences.

[31]  Leszek P. Pryszcz,et al.  MetaPhOrs: orthology and paralogy predictions from multiple phylogenetic evidence using a consistency-based confidence score , 2010, Nucleic acids research.

[32]  Christian M. Zmasek,et al.  This Déjà Vu Feeling—Analysis of Multidomain Protein Evolution in Eukaryotic Genomes , 2012, PLoS Comput. Biol..

[33]  Adrian M. Altenhoff,et al.  Standardized benchmarking in the quest for orthologs , 2016, Nature Methods.

[34]  J. Dopazo,et al.  The human phylome , 2007, Genome Biology.

[35]  J. Pritchard,et al.  Frequent nonallelic gene conversion on the human lineage and its effect on the divergence of gene duplicates , 2017, Proceedings of the National Academy of Sciences.

[36]  C. Dessimoz,et al.  Bidirectional Best Hits Miss Many Orthologs in Duplication-Rich Clades such as Plants and Animals , 2013, Genome biology and evolution.

[37]  S. Dongen Graph clustering by flow simulation , 2000 .

[38]  Yan Wang,et al.  Advances and Applications in the Quest for Orthologs , 2019, Molecular biology and evolution.

[39]  Olivier Poch,et al.  OrthoInspector 3.0: open portal for comparative genomics , 2018, Nucleic Acids Res..

[40]  Joaquín Dopazo,et al.  ETE: a python Environment for Tree Exploration , 2010, BMC Bioinformatics.

[41]  J. Finnerty,et al.  Evolution of function of a fused metazoan tRNA synthetase. , 2011, Molecular biology and evolution.

[42]  Toni Gabaldón,et al.  The Tree versus the Forest: The Fungal Tree of Life and the Topological Diversity within the Yeast Phylome , 2009, PloS one.

[43]  T. Gabaldón Large-scale assignment of orthology: back to phylogenetics? , 2008, Genome Biology.