Protein Complex Detection via Weighted Ensemble Clustering Based on Bayesian Nonnegative Matrix Factorization

Detecting protein complexes from protein-protein interaction (PPI) networks is a challenging task in computational biology. A vast number of computational methods have been proposed to undertake this task. However, each computational method is developed to capture one aspect of the network. The performance of different methods on the same network can differ substantially, even the same method may have different performance on networks with different topological characteristic. The clustering result of each computational method can be regarded as a feature that describes the PPI network from one aspect. It is therefore desirable to utilize these features to produce a more accurate and reliable clustering. In this paper, a novel Bayesian Nonnegative Matrix Factorization(NMF)-based weighted Ensemble Clustering algorithm (EC-BNMF) is proposed to detect protein complexes from PPI networks. We first apply different computational algorithms on a PPI network to generate some base clustering results. Then we integrate these base clustering results into an ensemble PPI network, in the form of weighted combination. Finally, we identify overlapping protein complexes from this network by employing Bayesian NMF model. When generating an ensemble PPI network, EC-BNMF can automatically optimize the values of weights such that the ensemble algorithm can deliver better results. Experimental results on four PPI networks of Saccharomyces cerevisiae well verify the effectiveness of EC-BNMF in detecting protein complexes. EC-BNMF provides an effective way to integrate different clustering results for more accurate and reliable complex detection. Furthermore, EC-BNMF has a high degree of flexibility in the choice of base clustering results. It can be coupled with existing clustering methods to identify protein complexes.

[1]  Kara Dolinski,et al.  Gene Ontology annotations at SGD: new data sources and annotation methods , 2007, Nucleic Acids Res..

[2]  Illés J. Farkas,et al.  CFinder: locating cliques and overlapping modules in biological networks , 2006, Bioinform..

[3]  B. Schwikowski,et al.  A network of protein–protein interactions in yeast , 2000, Nature Biotechnology.

[4]  B. Séraphin,et al.  A generic protein purification method for protein complex characterization and proteome exploration , 1999, Nature Biotechnology.

[5]  Mike Tyers,et al.  BioGRID: a general repository for interaction datasets , 2005, Nucleic Acids Res..

[6]  H. Sebastian Seung,et al.  Algorithms for Non-negative Matrix Factorization , 2000, NIPS.

[7]  Santo Fortunato,et al.  Consensus clustering in complex networks , 2012, Scientific Reports.

[8]  Joydeep Ghosh,et al.  Cluster Ensembles --- A Knowledge Reuse Framework for Combining Multiple Partitions , 2002, J. Mach. Learn. Res..

[9]  Srinivasan Parthasarathy,et al.  An ensemble framework for clustering protein-protein interaction networks , 2007, ISMB/ECCB.

[10]  Xiaoli Li,et al.  Computational approaches for detecting protein complexes from protein interaction networks: a survey , 2010, BMC Genomics.

[11]  Gary D Bader,et al.  A Combined Experimental and Computational Strategy to Define Protein Interaction Networks for Peptide Recognition Modules , 2001, Science.

[12]  Youping Deng,et al.  Recent advances in clustering methods for protein interaction networks , 2010, BMC Genomics.

[13]  Gary D. Bader,et al.  An automated method for finding molecular complexes in large protein interaction networks , 2003, BMC Bioinformatics.

[14]  David Botstein,et al.  SGD: Saccharomyces Genome Database , 1998, Nucleic Acids Res..

[15]  Xian-Sheng Hua,et al.  Ensemble Manifold Regularization , 2009, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[16]  Dao-Qing Dai,et al.  Exploring Overlapping Functional Units with Various Structure in Protein Interaction Networks , 2012, PloS one.

[17]  Derek Greene,et al.  Ensemble non-negative matrix factorization methods for clustering protein-protein interactions , 2008, Bioinform..

[18]  H. Sebastian Seung,et al.  Learning the parts of objects by non-negative matrix factorization , 1999, Nature.

[19]  Mona Singh,et al.  How and when should interactome-derived clusters be used to predict functional modules and protein function? , 2009, Bioinform..

[20]  Gary D Bader,et al.  Systematic identification of protein complexes in Saccharomyces cerevisiae by mass spectrometry , 2002, Nature.

[21]  Dao-Qing Dai,et al.  A Framework for Incorporating Functional Interrelationships into Protein Function Prediction Algorithms , 2012, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[22]  David Mackay,et al.  Probable networks and plausible predictions - a review of practical Bayesian methods for supervised neural networks , 1995 .

[23]  Michael K. Ng,et al.  Agglomerative Fuzzy K-Means Clustering Algorithm with Selection of Number of Clusters , 2008, IEEE Transactions on Knowledge and Data Engineering.

[24]  Gary D Bader,et al.  Analyzing yeast protein–protein interaction data obtained from different sources , 2002, Nature Biotechnology.

[25]  A. Barabasi,et al.  Network biology: understanding the cell's functional organization , 2004, Nature Reviews Genetics.

[26]  Steve Gregory,et al.  Finding overlapping communities in networks by label propagation , 2009, ArXiv.

[27]  Shigehiko Kanaya,et al.  Development and implementation of an algorithm for detection of protein complexes in large interaction networks , 2006, BMC Bioinformatics.

[28]  Ioannis Psorakis,et al.  Soft Partitioning in Networks via Bayesian Non-negative Matrix Factorization , 2010 .

[29]  Kara Dolinski,et al.  The BioGRID Interaction Database: 2011 update , 2010, Nucleic Acids Res..

[30]  Dao-Qing Dai,et al.  Protein Complexes Discovery Based on Protein-Protein Interaction Data via a Regularized Sparse Generative Network Model , 2012, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[31]  Michael L. Creech,et al.  Integration of biological networks and gene expression data using Cytoscape , 2007, Nature Protocols.

[32]  Dmitrij Frishman,et al.  MIPS: analysis and annotation of proteins from whole genomes in 2005 , 2006, Nucleic Acids Res..

[33]  C. Févotte,et al.  Automatic Relevance Determination in Nonnegative Matrix Factorization , 2009 .

[34]  C. Landry,et al.  An in Vivo Map of the Yeast Protein Interactome , 2008, Science.

[35]  Peng Jiang,et al.  SPICi: a fast clustering algorithm for large biological networks , 2010, Bioinform..

[36]  M. Ashburner,et al.  Gene Ontology: tool for the unification of biology , 2000, Nature Genetics.

[37]  M E J Newman,et al.  Community structure in social and biological networks , 2001, Proceedings of the National Academy of Sciences of the United States of America.

[38]  Igor Jurisica,et al.  Protein complex prediction via cost-based clustering , 2004, Bioinform..

[39]  Sean R. Collins,et al.  Toward a Comprehensive Atlas of the Physical Interactome of Saccharomyces cerevisiae*S , 2007, Molecular & Cellular Proteomics.

[40]  Aidong Zhang,et al.  Semantic integration to identify overlapping functional modules in protein interaction networks , 2007, BMC Bioinformatics.

[41]  Réka Albert,et al.  Near linear time algorithm to detect community structures in large-scale networks. , 2007, Physical review. E, Statistical, nonlinear, and soft matter physics.

[42]  Roded Sharan,et al.  Associating Genes and Protein Complexes with Disease via Network Propagation , 2010, PLoS Comput. Biol..

[43]  T. Vicsek,et al.  Uncovering the overlapping community structure of complex networks in nature and society , 2005, Nature.

[44]  C. Févotte,et al.  Automatic Relevance Determination in Nonnegative Matrix Factorization with the-Divergence , 2011 .

[45]  Sean R. Collins,et al.  Global landscape of protein complexes in the yeast Saccharomyces cerevisiae , 2006, Nature.

[46]  P. Bork,et al.  Proteome survey reveals modularity of the yeast cell machinery , 2006, Nature.

[47]  Sune Lehmann,et al.  Link communities reveal multiscale complexity in networks , 2009, Nature.

[48]  Xiaomei Quan,et al.  Survey: Functional Module Detection from Protein-Protein Interaction Networks , 2014, IEEE Transactions on Knowledge and Data Engineering.

[49]  Anil K. Jain,et al.  Clustering ensembles: models of consensus and weak partitions , 2005, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[50]  Hyunsoo Kim,et al.  Sparse Non-negative Matrix Factorizations via Alternating Non-negativity-constrained Least Squares , 2006 .

[51]  Kahn Rhrissorrakrai,et al.  MINE: Module Identification in Networks , 2011, BMC Bioinformatics.

[52]  R. Ozawa,et al.  A comprehensive two-hybrid analysis to explore the yeast protein interactome , 2001, Proceedings of the National Academy of Sciences of the United States of America.

[53]  Anton J. Enright,et al.  An efficient algorithm for large-scale detection of protein families. , 2002, Nucleic acids research.

[54]  Jacques van Helden,et al.  Evaluation of clustering algorithms for protein-protein interaction networks , 2006, BMC Bioinformatics.

[55]  Alain Guénoche,et al.  Multifunctional proteins revealed by overlapping clustering in protein interaction network , 2011, Bioinform..

[56]  Yanjun Qi,et al.  Protein complex identification by supervised graph local clustering , 2008, ISMB.

[57]  Guimei Liu,et al.  Complex discovery from weighted PPI networks , 2009, Bioinform..

[58]  Haiyuan Yu,et al.  Detecting overlapping protein complexes in protein-protein interaction networks , 2012, Nature Methods.

[59]  A. Barabasi,et al.  Hierarchical Organization of Modularity in Metabolic Networks , 2002, Science.