SVD-phy: improved prediction of protein functional associations through singular value decomposition of phylogenetic profiles

Summary: A successful approach for predicting functional associations between non-homologous genes is to compare their phylogenetic distributions. We have devised a phylogenetic profiling algorithm, SVD-Phy, which uses truncated singular value decomposition to address the problem of uninformative profiles giving rise to false positive predictions. Benchmarking the algorithm against the KEGG pathway database, we found that it has substantially improved performance over existing phylogenetic profiling methods. Availability and implementation: The software is available under the open-source BSD license at https://bitbucket.org/andrea/svd-phy Contact: lars.juhl.jensen@cpr.ku.dk Supplementary information: Supplementary data are available at Bioinformatics online.

[1]  Susumu Goto,et al.  Data, information, knowledge and principle: back to metabolism in KEGG , 2013, Nucleic Acids Res..

[2]  Yiming Cheng,et al.  ProtPhylo: identification of protein–phenotype and protein–protein functional associations via phylogenetic profiling , 2015, Nucleic Acids Res..

[3]  Thomas Rattei,et al.  SIMAP—the database of all-against-all protein sequence similarities and annotations with new interfaces and increased coverage , 2013, Nucleic Acids Res..

[4]  Or Zuk,et al.  Identification of small RNA pathway genes using patterns of phylogenetic conservation and divergence , 2012, Nature.

[5]  Jean-Michel Claverie,et al.  Annotation of bacterial genomes using improved phylogenomic profiles , 2003, ISMB.

[6]  Christophe Dessimoz,et al.  Phylogenetic Profiling: How Much Input Data Is Enough? , 2015, PloS one.

[7]  Davide Heller,et al.  STRING v10: protein–protein interaction networks, integrated over the tree of life , 2014, Nucleic Acids Res..

[8]  V. Mootha,et al.  Expansion of Biological Pathways Based on Evolutionary Inference , 2014, Cell.

[9]  Henning Hermjakob,et al.  The Reactome pathway Knowledgebase , 2015, Nucleic acids research.

[10]  Pericles A. Mitkas,et al.  Detection of Genomic Idiosyncrasies Using Fuzzy Phylogenetic Profiles , 2013, PloS one.

[11]  Peter D. Karp,et al.  EcoCyc: fusing model organism databases with systems biology , 2012, Nucleic Acids Res..

[12]  D. Eisenberg,et al.  Assigning protein functions by comparative genome analysis: protein phylogenetic profiles. , 1999, Proceedings of the National Academy of Sciences of the United States of America.

[13]  Gary Ruvkun,et al.  Human disease locus discovery and mapping to molecular pathways through phylogenetic profiling , 2013, Molecular systems biology.

[14]  Daniel Gautheret,et al.  NAPP: the Nucleic Acid Phylogenetic Profile Database , 2011, Nucleic Acids Res..

[15]  Philipp Bucher,et al.  Genomic context analysis reveals dense interaction network between vertebrate ultraconserved non-coding elements , 2012, Bioinform..

[16]  Edward M Marcotte,et al.  Discovery of uncharacterized cellular systems by genome-wide analysis of functional linkages , 2003, Nature Biotechnology.

[17]  Henning Hermjakob,et al.  The Reactome pathway knowledgebase , 2013, Nucleic Acids Res..