Predicting Protein Function by Multi-Label Correlated Semi-Supervised Learning

Assigning biological functions to uncharacterized proteins is a fundamental problem in the postgenomic era. The increasing availability of large amounts of data on protein-protein interactions (PPIs) has led to the emergence of a considerable number of computational methods for determining protein function in the context of a network. These algorithms, however, treat each functional class in isolation and thereby often suffer from the difficulty of the scarcity of labeled data. In reality, different functional classes are naturally dependent on one another. We propose a new algorithm, Multi-label Correlated Semi-supervised Learning (MCSL), to incorporate the intrinsic correlations among functional classes into protein function prediction by leveraging the relationships provided by the PPI network and the functional class network. The guiding intuition is that the classification function should be sufficiently smooth on subgraphs where the respective topologies of these two networks are a good match. We encode this intuition as regularized learning with intraclass and interclass consistency, which can be understood as an extension of the graph-based learning with local and global consistency (LGC) method. Cross validation on the yeast proteome illustrates that MCSL consistently outperforms several state-of-the-art methods. Most notably, it effectively overcomes the problem associated with scarcity of label data. The supplementary files are freely available at http://sites.google.com/site/csaijiang/MCSL.

[1]  T. Takagi,et al.  Assessment of prediction accuracy of protein function from protein–protein interaction data , 2001, Yeast.

[2]  S. Kasif,et al.  Whole-genome annotation by using evidence integration in functional-linkage networks. , 2004, Proceedings of the National Academy of Sciences of the United States of America.

[3]  Valerio Freschi,et al.  A Graph-Based Semi-supervised Algorithm for Protein Function Prediction from Interaction Maps , 2009, LION.

[4]  Roland Eils,et al.  Applying Support Vector Machines for Gene ontology based gene function prediction , 2004, BMC Bioinformatics.

[5]  B. Schwikowski,et al.  A network of protein–protein interactions in yeast , 2000, Nature Biotechnology.

[6]  Tao Mei,et al.  Graph-based semi-supervised learning with multi-label , 2008, 2008 IEEE International Conference on Multimedia and Expo.

[7]  Mona Singh,et al.  Whole-proteome prediction of protein function via graph-theoretic analysis of interaction maps , 2005, ISMB.

[8]  Søren Brunak,et al.  Prediction of human protein function according to Gene Ontology categories , 2003, Bioinform..

[9]  Christie S. Chang,et al.  The BioGRID interaction database: 2013 update , 2012, Nucleic Acids Res..

[10]  Nello Cristianini,et al.  A statistical framework for genomic data fusion , 2004, Bioinform..

[11]  Xiaojin Zhu,et al.  --1 CONTENTS , 2006 .

[12]  Limsoon Wong,et al.  Exploiting Indirect Neighbours and Topological Weight to Predict Protein Function from Protein-Protein Interactions , 2006, BioDM.

[13]  Yoshihiro Yamanishi,et al.  Protein network inference from multiple genomic data: a supervised approach , 2004, ISMB/ECCB.

[14]  Xing-Ming Zhao,et al.  Gene function prediction using labeled and unlabeled data , 2008, BMC Bioinformatics.

[15]  Alex E. Lash,et al.  Gene Expression Omnibus: NCBI gene expression and hybridization array data repository , 2002, Nucleic Acids Res..

[16]  Richard M. Karp,et al.  Comparing Protein Interaction Networks via a Graph Match-and-Split Algorithm , 2007, J. Comput. Biol..

[17]  Hans-Peter Kriegel,et al.  Protein function prediction via graph kernels , 2005, ISMB.

[18]  R. Sharan,et al.  Network-based prediction of protein function , 2007, Molecular systems biology.

[19]  Gang Chen,et al.  Semi-supervised Multi-label Learning by Solving a Sylvester Equation , 2008, SDM.

[20]  P. Bork,et al.  Functional organization of the yeast proteome by systematic analysis of protein complexes , 2002, Nature.

[21]  T. Ito,et al.  Toward a protein-protein interaction map of the budding yeast: A comprehensive system to examine two-hybrid interactions in all possible combinations between the yeast proteins. , 2000, Proceedings of the National Academy of Sciences of the United States of America.

[22]  Zoubin Ghahramani,et al.  Combining active learning and semi-supervised learning using Gaussian fields and harmonic functions , 2003, ICML 2003.

[23]  Bernhard Schölkopf,et al.  Learning with Local and Global Consistency , 2003, NIPS.

[24]  Nicola J. Rinaldi,et al.  Transcriptional regulatory code of a eukaryotic genome , 2004, Nature.

[25]  Shing-Tung Yau,et al.  Discrete Green's Functions , 2000, J. Comb. Theory A.

[26]  William Stafford Noble,et al.  Learning kernels from biological networks by maximizing entropy , 2004, ISMB/ECCB.

[27]  Alessandro Vespignani,et al.  Global protein function prediction from protein-protein interaction networks , 2003, Nature Biotechnology.

[28]  Serafim Batzoglou,et al.  Automatic Parameter Learning for Multiple Network Alignment , 2008, RECOMB.

[29]  Roded Sharan,et al.  A Propagation-based Algorithm for Inferring Gene-Disease Assocations , 2008, German Conference on Bioinformatics.

[30]  Bonnie Berger,et al.  Pairwise Global Alignment of Protein Interaction Networks by Matching Neighborhood Topology , 2007, RECOMB.

[31]  Chris H. Q. Ding,et al.  Image annotation using multi-label correlated Green's function , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[32]  Limsoon Wong,et al.  Using indirect protein interactions for the prediction of Gene Ontology functions , 2007, BMC Bioinformatics.

[33]  Paul Over,et al.  Evaluation campaigns and TRECVid , 2006, MIR '06.

[34]  Melanie L. Mayer,et al.  Protein networks—built by association , 2000, Nature Biotechnology.

[35]  Dmitrij Frishman,et al.  MIPS: a database for genomes and protein sequences , 1999, Nucleic Acids Res..

[36]  Mikhail Belkin,et al.  Regularization and Semi-supervised Learning on Large Graphs , 2004, COLT.

[37]  Gultekin Özsoyoglu,et al.  Annotating proteins by mining protein interaction networks , 2006, ISMB.

[38]  D. Eisenberg,et al.  Assigning protein functions by comparative genome analysis: protein phylogenetic profiles. , 1999, Proceedings of the National Academy of Sciences of the United States of America.

[39]  Kara Dolinski,et al.  The BioGRID Interaction Database: 2008 update , 2008, Nucleic Acids Res..

[40]  Jason Weston,et al.  Gene functional classification from heterogeneous data , 2001, RECOMB.

[41]  John Quackenbush Microarrays--Guilt by Association , 2003, Science.

[42]  H. Schaeffer,et al.  MP1: a MEK binding partner that enhances enzymatic activation of the MAP kinase cascade. , 1998, Science.

[43]  Gultekin Özsoyoglu,et al.  Protein Function Prediction Based on Patterns in Biological Networks , 2008, RECOMB.