Protein function prediction with the shortest path in functional linkage graph and boosting

Annotating proteins with biological functions is one of the main goals in post genomic era. Various high-throughout technologies, e.g., yeast two-hybrid systems and microarray, have provided an alternative way to protein function prediction. Despite the success obtained by high-throughout data, the errors in the data have not been handled well. In this work, a new technique for protein function prediction is presented, where a weighted functional linkage graph is generated by exploiting the existing protein-protein interaction data, complex data and gene expression data. By finding the shortest path in the functional linkage graph, the functional links among proteins can be captured. With the functional links available, the functions of unknown proteins can be predicted utilising support vector machines and the functions of those proteins that have functional links to the unknown proteins. In addition, the boosting algorithm is employed to further improve the prediction accuracy. The experiments on yeast genes show promising results and prove the efficiency of the proposed methods.

[1]  Dong Xu,et al.  Global protein function annotation through mining genome-scale data in yeast Saccharomyces cerevisiae. , 2004, Nucleic acids research.

[2]  Gail J. Bartlett,et al.  Effective function annotation through catalytic residue conservation. , 2005, Proceedings of the National Academy of Sciences of the United States of America.

[3]  D. Lipman,et al.  Improved tools for biological sequence comparison. , 1988, Proceedings of the National Academy of Sciences of the United States of America.

[4]  T. Takagi,et al.  Assessment of prediction accuracy of protein function from protein–protein interaction data , 2001, Yeast.

[5]  R. Polikar,et al.  Ensemble based systems in decision making , 2006, IEEE Circuits and Systems Magazine.

[6]  B. Schwikowski,et al.  A network of protein–protein interactions in yeast , 2000, Nature Biotechnology.

[7]  W. Wong,et al.  Transitive functional annotation by shortest-path analysis of gene expression data , 2002, Proceedings of the National Academy of Sciences of the United States of America.

[8]  P. Bork,et al.  Functional organization of the yeast proteome by systematic analysis of protein complexes , 2002, Nature.

[9]  Mike Tyers,et al.  BioGRID: a general repository for interaction datasets , 2005, Nucleic Acids Res..

[10]  Ting Chen,et al.  Mapping gene ontology to proteins based on protein-protein interaction data , 2004, Bioinform..

[11]  Nello Cristianini,et al.  Kernel-Based Data Fusion and Its Application to Protein Function Prediction in Yeast , 2003, Pacific Symposium on Biocomputing.

[12]  Yudong D. He,et al.  Functional Discovery via a Compendium of Expression Profiles , 2000, Cell.

[13]  Edsger W. Dijkstra,et al.  A note on two problems in connexion with graphs , 1959, Numerische Mathematik.

[14]  H. Mewes,et al.  The FunCat, a functional annotation scheme for systematic classification of proteins from whole genomes. , 2004, Nucleic acids research.

[15]  Sung-Hou Kim,et al.  Global mapping of the protein structure space and application in structure-based inference of protein function. , 2005, Proceedings of the National Academy of Sciences of the United States of America.

[16]  Gary D Bader,et al.  Systematic identification of protein complexes in Saccharomyces cerevisiae by mass spectrometry , 2002, Nature.

[17]  Stanley Letovsky,et al.  Predicting protein function from protein/protein interaction data: a probabilistic approach , 2003, ISMB.

[18]  Alessandro Vespignani,et al.  Global protein function prediction from protein-protein interaction networks , 2003, Nature Biotechnology.

[19]  Mona Singh,et al.  Whole-proteome prediction of protein function via graph-theoretic analysis of interaction maps , 2005, ISMB.

[20]  Robert E. Schapire,et al.  Hierarchical multi-label prediction of gene function , 2006, Bioinform..