Integrating multiple networks for protein function prediction

BackgroundHigh throughput techniques produce multiple functional association networks. Integrating these networks can enhance the accuracy of protein function prediction. Many algorithms have been introduced to generate a composite network, which is obtained as a weighted sum of individual networks. The weight assigned to an individual network reflects its benefit towards the protein functional annotation inference. A classifier is then trained on the composite network for predicting protein functions. However, since these techniques model the optimization of the composite network and the prediction tasks as separate objectives, the resulting composite network is not necessarily optimal for the follow-up protein function prediction.ResultsWe address this issue by modeling the optimization of the composite network and the prediction problems within a unified objective function. In particular, we use a kernel target alignment technique and the loss function of a network based classifier to jointly adjust the weights assigned to the individual networks. We show that the proposed method, called MNet, can achieve a performance that is superior (with respect to different evaluation criteria) to related techniques using the multiple networks of four example species (yeast, human, mouse, and fly) annotated with thousands (or hundreds) of GO terms.ConclusionMNet can effectively integrate multiple networks for protein function prediction and is robust to the input parameters. Supplementary data is available at https://sites.google.com/site/guoxian85/home/mnet. The Matlab code of MNet is available upon request.

[1]  Rong Jin,et al.  Multi-label Multiple Kernel Learning by Stochastic Approximation: Application to Visual Object Recognition , 2010, NIPS.

[2]  Vipin Kumar,et al.  Computational Approaches to Protein Function Prediction , 2012 .

[3]  Min-Ling Zhang,et al.  A Review on Multi-Label Learning Algorithms , 2014, IEEE Transactions on Knowledge and Data Engineering.

[4]  M. Ashburner,et al.  Gene Ontology: tool for the unification of biology , 2000, Nature Genetics.

[5]  David A. Lee,et al.  Predicting protein function from sequence and structure , 2007, Nature Reviews Molecular Cell Biology.

[6]  William Stafford Noble,et al.  Integrating Information for Protein Function Prediction , 2008 .

[7]  Xiaoyan Liu,et al.  Measuring gene functional similarity based on group-wise comparison of GO terms , 2013, Bioinform..

[8]  Giorgio Valentini,et al.  Hierarchical Ensemble Methods for Protein Function Prediction , 2014, ISRN bioinformatics.

[9]  C. Daub,et al.  BMC Systems Biology , 2007 .

[10]  David Warde-Farley,et al.  GeneMANIA: a real-time multiple association network integration algorithm for predicting gene function , 2008, Genome Biology.

[11]  Haixuan Yang,et al.  Improving GO semantic similarity measures by exploring the ontology beneath the terms and modelling uncertainty , 2012, Bioinform..

[12]  Vipin Kumar,et al.  Incorporating functional inter-relationships into protein function prediction algorithms , 2009, BMC Bioinformatics.

[13]  Zili Zhang,et al.  Protein Function Prediction by Integrating Multiple Kernels , 2013, IJCAI.

[14]  Nello Cristianini,et al.  A statistical framework for genomic data fusion , 2004, Bioinform..

[15]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[16]  R. Sharan,et al.  Network-based prediction of protein function , 2007, Molecular systems biology.

[17]  Dennis Shasha,et al.  Negative Example Selection for Protein Function Prediction: The NoGO Database , 2014, PLoS Comput. Biol..

[18]  Bernhard Schölkopf,et al.  Fast protein classification with multiple networks , 2005, ECCB/JBI.

[19]  Giorgio Valentini,et al.  True Path Rule Hierarchical Ensembles for Genome-Wide Gene Function Prediction , 2011, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[20]  Nicolò Cesa-Bianchi,et al.  Synergy of multi-label hierarchical ensembles, data fusion, and cost-sensitive methods for gene functional inference , 2012, Machine Learning.

[21]  Stanley Letovsky,et al.  Predicting protein function from protein/protein interaction data: a probabilistic approach , 2003, ISMB.

[22]  Hiroshi Mamitsuka,et al.  Efficient semi-supervised learning on locally informative multiple graphs , 2012, Pattern Recognit..

[23]  Karin M. Verspoor,et al.  Combining heterogeneous data sources for accurate functional annotation of proteins , 2013, BMC Bioinformatics.

[24]  Noémie Elhadad,et al.  Redundancy in electronic health record corpora: analysis, impact on text mining performance and mitigation strategies , 2013, BMC Bioinformatics.

[25]  Zhiwen Yu,et al.  Protein Function Prediction with Incomplete Annotations , 2014, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[26]  Meng Wang,et al.  Unified Video Annotation via Multigraph Learning , 2009, IEEE Transactions on Circuits and Systems for Video Technology.

[27]  Quaid Morris,et al.  Fast integration of heterogeneous data sources for predicting gene function with limited annotation , 2010, Bioinform..

[28]  L. Mirny,et al.  Protein complexes and functional modules in molecular networks , 2003, Proceedings of the National Academy of Sciences of the United States of America.

[29]  Yang Liu,et al.  Inferring the soybean (Glycine max) microRNA functional network based on target gene network , 2014, Bioinform..

[30]  Rong Jin,et al.  Multi-label learning with incomplete class assignments , 2011, CVPR 2011.

[31]  Lenore Cowen,et al.  New directions for diffusion-based network prediction of protein function: incorporating pathways with confidence , 2014, Bioinform..

[32]  Daniel W. A. Buchan,et al.  A large-scale evaluation of computational protein function prediction , 2013, Nature Methods.

[33]  Ethem Alpaydin,et al.  Multiple Kernel Learning Algorithms , 2011, J. Mach. Learn. Res..

[34]  Slobodan Vucetic,et al.  MS-kNN: protein function prediction by integrating multiple data sources , 2013, BMC Bioinformatics.

[35]  Jason Weston,et al.  Mismatch string kernels for discriminative protein classification , 2004, Bioinform..

[36]  Mehryar Mohri,et al.  Algorithms for Learning Kernels Based on Centered Alignment , 2012, J. Mach. Learn. Res..

[37]  N. Cristianini,et al.  On Kernel-Target Alignment , 2001, NIPS.

[38]  Bernhard Schölkopf,et al.  Learning with Local and Global Consistency , 2003, NIPS.

[39]  Jason Weston,et al.  Learning Gene Functional Classifications from Multiple Data Types , 2002, J. Comput. Biol..

[40]  B. Schwikowski,et al.  A network of protein–protein interactions in yeast , 2000, Nature Biotechnology.

[41]  Daniel W. A. Buchan,et al.  Protein function prediction by massive integration of evolutionary analyses and multiple data sources , 2013, BMC Bioinformatics.

[42]  Helen C. Shen,et al.  Linear Neighborhood Propagation and Its Applications , 2009, IEEE Transactions on Pattern Analysis and Machine Intelligence.