Protein function prediction based on data fusion and functional interrelationship.

One of the challenging tasks of bioinformatics is to predict more accurate and confident protein functions from genomics and proteomics datasets. Computational approaches use a variety of high throughput experimental data, such as protein-protein interaction (PPI), protein sequences and phylogenetic profiles, to predict protein functions. This paper presents a method that uses transductive multi-label learning algorithm by integrating multiple data sources for classification. Multiple proteomics datasets are integrated to make inferences about functions of unknown proteins and use a directed bi-relational graph to assign labels to unannotated proteins. Our method, bi-relational graph based transductive multi-label function annotation (Bi-TMF) uses functional correlation and topological PPI network properties on both the training and testing datasets to predict protein functions through data fusion of the individual kernel result. The main purpose of our proposed method is to enhance the performance of classifier integration for protein function prediction algorithms. Experimental results demonstrate the effectiveness and efficiency of Bi-TMF on multi-sources datasets in yeast, human and mouse benchmarks. Bi-TMF outperforms other recently proposed methods.

[1]  Corinna Cortes,et al.  Support-Vector Networks , 1995, Machine Learning.

[2]  Bo Wang,et al.  Dynamic Label Propagation for Semi-supervised Multi-class Multi-label Classification , 2013, ICCV.

[3]  Carolina S. Vollert,et al.  The Phox Homology (PX) Domain Protein Interaction Network in Yeast*S , 2004, Molecular & Cellular Proteomics.

[4]  William Stafford Noble,et al.  Integrating Information for Protein Function Prediction , 2008 .

[5]  Goran Neshich,et al.  Predicting enzyme class from protein structure using Bayesian classification. , 2006, Genetics and molecular research : GMR.

[6]  Dariya S. Glazer,et al.  The FEATURE framework for protein function annotation: modeling new functions, improving performance, and extending to novel applications , 2008, BMC Genomics.

[7]  Nello Cristianini,et al.  An Introduction to Support Vector Machines and Other Kernel-based Learning Methods , 2000 .

[8]  B. Schwikowski,et al.  A network of protein–protein interactions in yeast , 2000, Nature Biotechnology.

[9]  Slobodan Vucetic,et al.  MS-kNN: protein function prediction by integrating multiple data sources , 2013, BMC Bioinformatics.

[10]  Thomas L. Madden,et al.  Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. , 1997, Nucleic acids research.

[11]  Bernhard Schölkopf,et al.  Fast protein classification with multiple networks , 2005, ECCB/JBI.

[12]  Michael K. Ng,et al.  Transductive Multilabel Learning via Label Set Propagation , 2013, IEEE Transactions on Knowledge and Data Engineering.

[13]  Zili Zhang,et al.  Protein Function Prediction by Integrating Multiple Kernels , 2013, IJCAI.

[14]  R. Sharan,et al.  Network-based prediction of protein function , 2007, Molecular systems biology.

[15]  Alessandro Vespignani,et al.  Global protein function prediction from protein-protein interaction networks , 2003, Nature Biotechnology.

[16]  Meng Wang,et al.  Unified Video Annotation via Multigraph Learning , 2009, IEEE Transactions on Circuits and Systems for Video Technology.

[17]  Shailesh V. Date,et al.  A Probabilistic Functional Network of Yeast Genes , 2004, Science.

[18]  Christopher M. Bishop,et al.  Neural networks for pattern recognition , 1995 .

[19]  Javad Zahiri,et al.  Computational Prediction of Protein–Protein Interaction Networks: Algo-rithms and Resources , 2013, Current genomics.

[20]  Zhiwen Yu,et al.  Protein Function Prediction Using Multilabel Ensemble Classification , 2013, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[21]  Yoram Singer,et al.  BoosTexter: A Boosting-based System for Text Categorization , 2000, Machine Learning.

[22]  Dao-Qing Dai,et al.  A Framework for Incorporating Functional Interrelationships into Protein Function Prediction Algorithms , 2012, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[23]  Vladimir Vapnik,et al.  Statistical learning theory , 1998 .

[24]  Ting Chen,et al.  An Integrated Probabilistic Model for Functional Prediction of Proteins , 2004, J. Comput. Biol..

[25]  Quaid Morris,et al.  Fast integration of heterogeneous data sources for predicting gene function with limited annotation , 2010, Bioinform..

[26]  Daisuke Kihara,et al.  Structure- and sequence-based function prediction for non-homologous proteins , 2012, Journal of Structural and Functional Genomics.

[27]  Masashi Sugiyama,et al.  Robust Label Propagation on Multiple Networks , 2009, IEEE Transactions on Neural Networks.

[28]  D. Eisenberg,et al.  A combined algorithm for genome-wide prediction of protein function , 1999, Nature.

[29]  Fang-Xiang Wu,et al.  Identifying protein complexes in protein–protein interaction networks by using clique seeds and graph entropy , 2013, Proteomics.

[30]  Koby Crammer,et al.  New Regularized Algorithms for Transductive Learning , 2009, ECML/PKDD.

[31]  Jonathan Qiang Jiang,et al.  Learning Protein Functions from Bi-relational Graph of Proteins and Function Annotations , 2011, WABI.

[32]  Dit-Yan Yeung,et al.  Transductive Learning on Adaptive Graphs , 2010, AAAI.

[33]  Jean-Philippe Vert,et al.  Graph-Driven Feature Extraction From Microarray Data Using Diffusion Kernels and Kernel CCA , 2002, NIPS.

[34]  Ben He,et al.  Transductive Learning for Real-Time Twitter Search , 2012, ICWSM.