Protein Complex Detection via Effective Integration of Base Clustering Solutions and Co-Complex Affinity Scores

With the increasing availability of protein interaction data, various computational methods have been developed to predict protein complexes. However, different computational methods may have their own advantages and limitations. Ensemble clustering has thus been studied to minimize the potential bias and risk of individual methods and generate prediction results with better coverage and accuracy. In this paper, we extend the traditional ensemble clustering by taking into account the co-complex affinity scores and present an Ensemble H ierarchical Clustering framework (EnsemHC) to detect protein complexes. First, we construct co-cluster matrices by integrating the clustering results with the co-complex evidences. Second, we sum up the constructed co-cluster matrices to derive a final ensemble matrix via a novel iterative weighting scheme. Finally, we apply the hierarchical clustering to generate protein complexes from the final ensemble matrix. Experimental results demonstrate that our EnsemHC performs better than its base clustering methods and various existing integrative methods. In addition, we also observed that integrating the clusters and co-complex affinity scores from different data sources will improve the prediction performance, e.g., integrating the clusters from TAP data and co-complex affinities from binary PPI data achieved the best performance in our experiments.

[1]  Mona Singh,et al.  How and when should interactome-derived clusters be used to predict functional modules and protein function? , 2009, Bioinform..

[2]  Ana L. N. Fred,et al.  Combining multiple clusterings using evidence accumulation , 2005, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[3]  Adam J. Smith,et al.  The Database of Interacting Proteins: 2004 update , 2004, Nucleic Acids Res..

[4]  Sean R. Collins,et al.  Global landscape of protein complexes in the yeast Saccharomyces cerevisiae , 2006, Nature.

[5]  Peng Yang,et al.  Detecting temporal protein complexes from dynamic protein-protein interaction networks , 2014, BMC Bioinformatics.

[6]  Chee Keong Kwoh,et al.  Discovery of Protein Complexes with Core-Attachment Structures from Tandem Affinity Purification (TAP) Data , 2012, J. Comput. Biol..

[7]  P. Bork,et al.  Proteome survey reveals modularity of the yeast cell machinery , 2006, Nature.

[8]  Yi Pan,et al.  Detecting Protein Complexes Based on Uncertain Graph Model , 2014, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[9]  Yi Pan,et al.  A Fast Hierarchical Clustering Algorithm for Functional Modules Discovery in Protein Interaction Networks , 2011, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[10]  Gang Chen,et al.  Modifying the DPClus algorithm for identifying protein complexes based on new topological structures , 2008, BMC Bioinformatics.

[11]  T. Ideker,et al.  Integrative approaches for finding modular structure in biological networks , 2013, Nature Reviews Genetics.

[12]  Igor Jurisica,et al.  Protein complex prediction via cost-based clustering , 2004, Bioinform..

[13]  Ambuj K. Singh,et al.  RRW: repeated random walks on genome-scale protein networks for local cluster discovery , 2009, BMC Bioinformatics.

[14]  Le Ou-Yang,et al.  Protein Complex Detection via Weighted Ensemble Clustering Based on Bayesian Nonnegative Matrix Factorization , 2013, PloS one.

[15]  Yi Pan,et al.  ClusterViz: A Cytoscape APP for Cluster Analysis of Biological Network , 2015, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[16]  Andrew Emili,et al.  Identifying functional modules in the physical interactome of Saccharomyces cerevisiae , 2007, Proteomics.

[17]  See-Kiong Ng,et al.  PLW: Probabilistic Local Walks for detecting protein complexes from protein interaction networks , 2013, BMC Genomics.

[18]  Caroline C. Friedel,et al.  Bootstrapping the Interactome: Unsupervised Identification of Protein Complexes in Yeast , 2008, RECOMB.

[19]  Illés J. Farkas,et al.  CFinder: locating cliques and overlapping modules in biological networks , 2006, Bioinform..

[20]  Xiaomei Quan,et al.  Survey: Functional Module Detection from Protein-Protein Interaction Networks , 2014, IEEE Transactions on Knowledge and Data Engineering.

[21]  Gary D. Bader,et al.  An automated method for finding molecular complexes in large protein interaction networks , 2003, BMC Bioinformatics.

[22]  Chee Keong Kwoh,et al.  Construction of co-complex score matrix for protein complex prediction from AP-MS data , 2011, Bioinform..

[23]  Derek Greene,et al.  Ensemble non-negative matrix factorization methods for clustering protein-protein interactions , 2008, Bioinform..

[24]  Limsoon Wong,et al.  Using Indirect protein-protein Interactions for protein Complex Prediction , 2008, J. Bioinform. Comput. Biol..

[25]  Jie Zheng,et al.  Identifying protein complexes from heterogeneous biological data , 2013, Proteins.

[26]  Guimei Liu,et al.  Complex discovery from weighted PPI networks , 2009, Bioinform..

[27]  Haiyuan Yu,et al.  Detecting overlapping protein complexes in protein-protein interaction networks , 2012, Nature Methods.

[28]  Xiaoli Li,et al.  Computational approaches for detecting protein complexes from protein interaction networks: a survey , 2010, BMC Genomics.

[29]  Anton J. Enright,et al.  An efficient algorithm for large-scale detection of protein families. , 2002, Nucleic acids research.

[30]  Mark Gerstein,et al.  Predicting interactions in protein networks by completing defective cliques , 2006, Bioinform..

[31]  S. Pu,et al.  Up-to-date catalogues of yeast protein complexes , 2008, Nucleic acids research.

[32]  Peng Jiang,et al.  SPICi: a fast clustering algorithm for large biological networks , 2010, Bioinform..

[33]  Ron Shamir,et al.  Identification of functional modules using network topology and high-throughput data , 2007, BMC Systems Biology.

[34]  Bin Xu,et al.  From Function to Interaction: A New Paradigm for Accurately Predicting Protein Complexes Based on Protein-to-Protein Interaction Networks , 2014, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[35]  Pall I. Olason,et al.  A human phenome-interactome network of protein complexes implicated in genetic disorders , 2007, Nature Biotechnology.

[36]  See-Kiong Ng,et al.  Discovering protein complexes in dense reliable neighborhoods of protein interaction networks. , 2007, Computational systems bioinformatics. Computational Systems Bioinformatics Conference.

[37]  Min Wu,et al.  A core-attachment based method to detect protein complexes in PPI networks , 2009, BMC Bioinformatics.

[38]  Yun Yang,et al.  Temporal Data Clustering via Weighted Clustering Ensemble with Different Representations , 2011, IEEE Transactions on Knowledge and Data Engineering.

[39]  Shigehiko Kanaya,et al.  Development and implementation of an algorithm for detection of protein complexes in large interaction networks , 2006, BMC Bioinformatics.

[40]  Insuk Lee,et al.  A high-accuracy consensus map of yeast protein complexes reveals modular nature of gene essentiality , 2007, BMC Bioinformatics.

[41]  Guimei Liu,et al.  Supervised maximum-likelihood weighting of composite protein networks for complex prediction , 2012, BMC Systems Biology.

[42]  Srinivasan Parthasarathy,et al.  An ensemble framework for clustering protein-protein interaction networks , 2007, ISMB/ECCB.