SonicParanoid: fast, accurate and easy orthology inference

Motivation: Orthology inference constitutes a common base of many genome‐based studies, as a pre‐requisite for annotating new genomes, finding target genes for biotechnological applications and revealing the evolutionary history of life. Although its importance keeps rising with the ever‐growing number of sequenced genomes, existing tools are computationally demanding and difficult to employ. Results: Here, we present SonicParanoid, which is faster than, but comparably accurate to, the well‐established tools with a balanced precision‐recall trade‐off. Furthermore, SonicParanoid substantially relieves the difficulties of orthology inference for those who need to construct and maintain their own genomic datasets. Availability and implementation: SonicParanoid is available with a GNU GPLv3 license on the Python Package Index and BitBucket. Documentation is available at http://iwasakilab.bs.s.u‐tokyo.ac.jp/sonicparanoid. Supplementary information: Supplementary data are available at Bioinformatics online.

[1]  S. Kelly,et al.  OrthoFinder: solving fundamental biases in whole genome comparisons dramatically improves orthogroup inference accuracy , 2015, Genome Biology.

[2]  Erik L. L. Sonnhammer,et al.  InParanoid 8: orthology analysis between 273 proteomes, mostly eukaryotic , 2014, Nucleic Acids Res..

[3]  Sonja J. Prohaska,et al.  Proteinortho: Detection of (Co-)orthologs in large-scale analysis , 2011, BMC Bioinformatics.

[4]  Adrian M. Altenhoff,et al.  Standardized benchmarking in the quest for orthologs , 2016, Nature Methods.

[5]  G. Cochrane,et al.  The International Nucleotide Sequence Database Collaboration , 2011, Nucleic Acids Res..

[6]  Gaston H. Gonnet,et al.  Orthologous Matrix (OMA) algorithm 2.0: more robust to asymmetric evolutionary rates and more scalable hierarchical orthologous group inference , 2017, Bioinform..

[7]  Anushya Muruganujan,et al.  PANTHER version 11: expanded annotation data from Gene Ontology and Reactome pathways, and data analysis tool enhancements , 2016, Nucleic Acids Res..

[8]  Christophe Dessimoz,et al.  Inferring orthology and paralogy. , 2012, Methods in molecular biology.

[9]  Ning Ma,et al.  BLAST+: architecture and applications , 2009, BMC Bioinformatics.

[10]  Albert J. Vilella,et al.  Joining forces in the quest for orthologs , 2009, Genome Biology.

[11]  Stefan Behnel,et al.  Cython: The Best of Both Worlds , 2011, Computing in Science & Engineering.

[12]  Chao Xie,et al.  Fast and sensitive protein alignment using DIAMOND , 2014, Nature Methods.

[13]  Mateusz Kaduk,et al.  Improved orthology inference with Hieranoid 2 , 2017, Bioinform..

[14]  Gang Liu,et al.  Automatic clustering of orthologs and inparalogs shared by multiple proteomes , 2006, ISMB.

[15]  Johannes Söding,et al.  MMseqs2: sensitive protein sequence searching for the analysis of massive data sets , 2017, bioRxiv.

[16]  C. Stoeckert,et al.  OrthoMCL: identification of ortholog groups for eukaryotic genomes. , 2003, Genome research.

[17]  Erik L. L. Sonnhammer,et al.  InParanoid 7: new algorithms and tools for eukaryotic orthology analysis , 2009, Nucleic Acids Res..

[18]  Christian E. V. Storm,et al.  Automatic clustering of orthologs and in-paralogs from pairwise species comparisons. , 2001, Journal of molecular biology.