Improved orthology inference with Hieranoid 2

Motivation: The initial step in many orthology inference methods is the computationally demanding establishment of all pairwise protein similarities across all analysed proteomes. The quadratic scaling with proteomes has become a major bottleneck. A remedy is offered by the Hieranoid algorithm which reduces the complexity to linear by hierarchically aggregating ortholog groups from InParanoid along a species tree. Results: We have further developed the Hieranoid algorithm in many ways. Major improvements have been made to the construction of multiple sequence alignments and consensus sequences. Hieranoid version 2 was evaluated with standard benchmarks that reveal a dramatic increase in the coverage/accuracy tradeoff over version 1, such that it now compares favourably with the best methods. The new parallelized cluster mode allows Hieranoid to be run on large data sets in a much shorter timespan than InParanoid, yet at similar accuracy. Contact: mateusz.kaduk@scilifelab.se Availability and Implementation: Perl code freely available at http://hieranoid.sbc.su.se/. Supplementary information: Supplementary data are available at Bioinformatics online.

[1]  P. Bork,et al.  ETE 3: Reconstruction, Analysis, and Visualization of Phylogenomic Data , 2016, Molecular biology and evolution.

[2]  Robert C. Edgar,et al.  MUSCLE: a multiple sequence alignment method with reduced time and space complexity , 2004, BMC Bioinformatics.

[3]  Sean R. Eddy,et al.  nhmmer: DNA homology search with profile HMMs , 2013, Bioinform..

[4]  R. Durbin,et al.  Pfam: A comprehensive database of protein domain families based on seed alignments , 1997, Proteins.

[5]  Maria Jesus Martin,et al.  Big data and other challenges in the quest for orthologs , 2014, Bioinform..

[6]  E. Sonnhammer,et al.  Modular arrangement of proteins as inferred from analysis of homology , 1994, Protein science : a publication of the Protein Society.

[7]  M. Huynen,et al.  Benchmarking ortholog identification methods using functional genomics data , 2006, Genome Biology.

[8]  Ning Ma,et al.  BLAST+: architecture and applications , 2009, BMC Bioinformatics.

[9]  William R Pearson,et al.  Selecting the Right Similarity‐Scoring Matrix , 2013, Current protocols in bioinformatics.

[10]  Christophe Dessimoz,et al.  Speeding up all-against-all protein comparisons while maintaining sensitivity by considering subsequence-level homology , 2014, PeerJ.

[11]  L. Patthy,et al.  Detecting homology of distantly related proteins with consensus sequences. , 1987, Journal of molecular biology.

[12]  Erik L. L. Sonnhammer,et al.  Inparanoid: a comprehensive database of eukaryotic orthologs , 2004, Nucleic Acids Res..

[13]  R. Jensen Orthologs and paralogs - we need to get it right , 2001, Genome Biology.

[14]  Christian E. V. Storm,et al.  Automatic clustering of orthologs and in-paralogs from pairwise species comparisons. , 2001, Journal of molecular biology.

[15]  Javier Herrero,et al.  Toward community standards in the quest for orthologs , 2012, Bioinform..

[16]  Erik L. L. Sonnhammer,et al.  Scoredist: A simple and robust protein sequence distance estimator , 2005, BMC Bioinformatics.

[17]  S. Baldauf,et al.  Phylogeny for the faint of heart: a tutorial. , 2003, Trends in genetics : TIG.

[18]  Robert C. Edgar,et al.  BIOINFORMATICS APPLICATIONS NOTE , 2001 .

[19]  E. Koonin,et al.  Functional and evolutionary implications of gene orthology , 2013, Nature Reviews Genetics.

[20]  Adrian M. Altenhoff,et al.  Standardized benchmarking in the quest for orthologs , 2016, Nature Methods.

[21]  Fabian Schreiber,et al.  Hieranoid: hierarchical orthology inference. , 2013, Journal of molecular biology.

[22]  Christophe Dessimoz,et al.  Phylogenetic and Functional Assessment of Orthologs Inference Projects and Methods , 2009, PLoS Comput. Biol..

[23]  P. Bork,et al.  Orthology prediction methods: A quality assessment using curated protein families , 2011, BioEssays : news and reviews in molecular, cellular and developmental biology.

[24]  Matthew R. Pocock,et al.  The Bioperl toolkit: Perl modules for the life sciences. , 2002, Genome research.