Framework to Identify Protein Complexes Based on Similarity Preclustering

Proteins interact with each other to form protein complexes, and cell functionality depends on both protein interactions and these complexes. Based on the assumption that protein complexes are highly connected and correspond to the dense regions in Protein-protein Interaction Networks (PINs), many methods have been proposed to identify the dense regions in PINs. Because protein complexes may be formed by proteins with similar properties, such as topological and functional properties, in this paper, we propose a protein complex identification framework (KCluster). In KCluster, a PIN is divided into K subnetworks using a K-means algorithm, and each subnetwork comprises proteins of similar degrees. We adopt a strategy based on the expected number of common neighbors to detect the protein complexes in each subnetwork. Moreover, we identify the protein complexes spanning two subnetworks by combining closely linked protein complexes from different subnetworks. Finally, we refine the predicted protein complexes using protein subcellular localization information. We apply KCluster and nine existing methods to identify protein complexes from a highly reliable yeast PIN. The results show that KCluster achieves higher Sn and Sp values and f-measures than other nine methods. Furthermore, the number of perfect matches predicted by KCluster is significantly higher than that of other nine methods.

[1]  Gary D. Bader,et al.  An automated method for finding molecular complexes in large protein interaction networks , 2003, BMC Bioinformatics.

[2]  Shigehiko Kanaya,et al.  Development and implementation of an algorithm for detection of protein complexes in large interaction networks , 2006, BMC Bioinformatics.

[3]  Guimei Liu,et al.  Complex discovery from weighted PPI networks , 2009, Bioinform..

[4]  Yi Pan,et al.  A comparison of the functional modules identified from time course and static PPI network data , 2011, BMC Bioinformatics.

[5]  Ambuj K. Singh,et al.  RRW: repeated random walks on genome-scale protein networks for local cluster discovery , 2009, BMC Bioinformatics.

[6]  Anton J. Enright,et al.  An efficient algorithm for large-scale detection of protein families. , 2002, Nucleic acids research.

[7]  M E J Newman,et al.  Community structure in social and biological networks , 2001, Proceedings of the National Academy of Sciences of the United States of America.

[8]  Siu-Ming Yiu,et al.  Predicting Protein Complexes from PPI Data: A Core-Attachment Approach , 2009, J. Comput. Biol..

[9]  J. MacQueen Some methods for classification and analysis of multivariate observations , 1967 .

[10]  L. Mirny,et al.  Protein complexes and functional modules in molecular networks , 2003, Proceedings of the National Academy of Sciences of the United States of America.

[11]  Yi Pan,et al.  An effective method for refining predicted protein complexes based on protein activity and the mechanism of protein complex formation , 2013, BMC Systems Biology.

[12]  Igor Jurisica,et al.  Functional topology in a network of protein interactions , 2004, Bioinform..

[13]  Jianer Chen,et al.  A Fast Agglomerate Algorithm for Mining Functional Modules in Protein Interaction Networks , 2008, 2008 International Conference on BioMedical Engineering and Informatics.

[14]  Mao-Bin Hu,et al.  Detect overlapping and hierarchical community structure in networks , 2008, ArXiv.

[15]  Christian Stolte,et al.  COMPARTMENTS: unification and visualization of protein subcellular localization evidence , 2014, Database J. Biol. Databases Curation.

[16]  T. Vicsek,et al.  Uncovering the overlapping community structure of complex networks in nature and society , 2005, Nature.

[17]  Livia Perfetto,et al.  MINT, the molecular interaction database: 2012 update , 2011, Nucleic Acids Res..

[18]  Damian Szklarczyk,et al.  The STRING database in 2011: functional interaction networks of proteins, globally integrated and scored , 2010, Nucleic Acids Res..

[19]  S. Pu,et al.  Up-to-date catalogues of yeast protein complexes , 2008, Nucleic acids research.

[20]  Yi Pan,et al.  Detecting Protein Complexes Based on Uncertain Graph Model , 2014, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[21]  P. Bork,et al.  Proteome survey reveals modularity of the yeast cell machinery , 2006, Nature.

[22]  Lusheng Wang,et al.  Identification of Protein Complexes Using Weighted PageRank-Nibble Algorithm and Core-Attachment Structure , 2015, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[23]  Guimei Liu,et al.  Supervised maximum-likelihood weighting of composite protein networks for complex prediction , 2012, BMC Systems Biology.

[24]  Yi Pan,et al.  Towards the identification of protein complexes and functional modules by integrating PPI network and gene expression data , 2012, BMC Bioinformatics.

[25]  Gang Chen,et al.  Modifying the DPClus algorithm for identifying protein complexes based on new topological structures , 2008, BMC Bioinformatics.

[26]  Lusheng Wang,et al.  Predicting Protein Functions by Using Unbalanced Random Walk Algorithm on Three Biological Networks , 2017, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[27]  Jennifer M. Rust,et al.  The BioGRID Interaction Database , 2011 .

[28]  Yi Pan,et al.  Construction and application of dynamic protein interaction network based on time course gene expression data , 2013, Proteomics.

[29]  K. Chou,et al.  Using Functional Domain Composition and Support Vector Machines for Prediction of Protein Subcellular Location* , 2002, The Journal of Biological Chemistry.

[30]  Haiyuan Yu,et al.  Detecting overlapping protein complexes in protein-protein interaction networks , 2012, Nature Methods.

[31]  Yi Pan,et al.  Identifying protein complexes from interaction networks based on clique percolation and distance restriction , 2010, BMC Genomics.

[32]  Rafael C. Jimenez,et al.  The IntAct molecular interaction database in 2012 , 2011, Nucleic Acids Res..

[33]  S. Dongen Graph clustering by flow simulation , 2000 .

[34]  Yi Pan,et al.  A Fast Hierarchical Clustering Algorithm for Functional Modules Discovery in Protein Interaction Networks , 2011, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[35]  Illés J. Farkas,et al.  CFinder: locating cliques and overlapping modules in biological networks , 2006, Bioinform..

[36]  R. Durbin,et al.  Systematic Analysis of Human Protein Complexes Identifies Chromosome Segregation Proteins , 2010, Science.

[37]  Kara Dolinski,et al.  The BioGRID Interaction Database: 2011 update , 2010, Nucleic Acids Res..

[38]  Min Wu,et al.  A core-attachment based method to detect protein complexes in PPI networks , 2009, BMC Bioinformatics.

[39]  Miss A.O. Penney (b) , 1974, The New Yale Book of Quotations.

[40]  Yi Pan,et al.  Identification of Essential Proteins Based on Edge Clustering Coefficient , 2012, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[41]  Feng Luo,et al.  Modular organization of protein interaction networks , 2007, Bioinform..

[42]  Yi Pan,et al.  ClusterViz: A Cytoscape APP for Cluster Analysis of Biological Network , 2015, IEEE/ACM Transactions on Computational Biology and Bioinformatics.