Double-layer clustering method to predict protein complexes based on power-law distribution and protein sublocalization.

Identifying protein complexes from Protein-protein Interaction Networks (PINs) is fundamental for understanding protein functions and activities in cell. Based on the assumption that protein complexes are highly connected areas in PINs, many algorithms were proposed to identify protein complexes from PINs. However, most of these approaches neglected that not all proteins in complexes are highly connected, and proteins in PINs with different topological properties may form protein complexes in different ways and should be treated differently. In this paper, we proposed a double-layer clustering method based on the power-law distribution (PLCluster). To calculate the centrality scores of nodes, we proposed a Dense-Spread Centrality method. The centrality scores calculated by Dense-Spread Centrality method follow a power-law distribution. Based on the power-law distribution of the centrality scores, PLCluster divides the nodes into two categories: the nodes with very high centrality scores and the nodes with lower centrality scores. Then different strategies are applied to nodes in different categories for detecting protein complexes from the PIN, respectively. Furthermore, the predicted protein complexes, which are inconsistent with the fact that all proteins in a protein complex should be in the same subcellular compartment, are filtered out. Compared with other nine existing methods on a high reliable yeast PIN, PLCluster shows great advantages in terms of the number of known complexes that are identified, Sensitivity, Specificity, f-measure and the number of perfect matches.

[1]  Osamu Maruyama,et al.  Sampling strategy for protein complex prediction using cluster size frequency. , 2013, Gene.

[2]  Ambuj K. Singh,et al.  RRW: repeated random walks on genome-scale protein networks for local cluster discovery , 2009, BMC Bioinformatics.

[3]  Limsoon Wong,et al.  Prediction of problematic complexes from PPI networks: sparse, embedded, and small complexes , 2015, Biology Direct.

[4]  S. vanDongen Graph Clustering by Flow Simulation , 2000 .

[5]  Yi Pan,et al.  An effective method for refining predicted protein complexes based on protein activity and the mechanism of protein complex formation , 2013, BMC Systems Biology.

[6]  Guimei Liu,et al.  Complex discovery from weighted PPI networks , 2009, Bioinform..

[7]  Yi Pan,et al.  Construction and application of dynamic protein interaction network based on time course gene expression data , 2013, Proteomics.

[8]  Yi Pan,et al.  A comparison of the functional modules identified from time course and static PPI network data , 2011, BMC Bioinformatics.

[9]  M E J Newman,et al.  Community structure in social and biological networks , 2001, Proceedings of the National Academy of Sciences of the United States of America.

[10]  K. Chou,et al.  Using Functional Domain Composition and Support Vector Machines for Prediction of Protein Subcellular Location* , 2002, The Journal of Biological Chemistry.

[11]  L. Mirny,et al.  Protein complexes and functional modules in molecular networks , 2003, Proceedings of the National Academy of Sciences of the United States of America.

[12]  Guimei Liu,et al.  Decomposing PPI networks for complex discovery , 2010, 2010 IEEE International Conference on Bioinformatics and Biomedicine (BIBM).

[13]  Yi Pan,et al.  A Fast Hierarchical Clustering Algorithm for Functional Modules Discovery in Protein Interaction Networks , 2011, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[14]  Damian Szklarczyk,et al.  The STRING database in 2011: functional interaction networks of proteins, globally integrated and scored , 2010, Nucleic Acids Res..

[15]  Osamu Maruyama,et al.  NWE: Node-weighted expansion for protein complex prediction using random walk distances , 2010, 2010 IEEE International Conference on Bioinformatics and Biomedicine (BIBM).

[16]  Gang Chen,et al.  Modifying the DPClus algorithm for identifying protein complexes based on new topological structures , 2008, BMC Bioinformatics.

[17]  Lusheng Wang,et al.  Identification of Protein Complexes Using Weighted PageRank-Nibble Algorithm and Core-Attachment Structure , 2015, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[18]  Haiyuan Yu,et al.  Detecting overlapping protein complexes in protein-protein interaction networks , 2012, Nature Methods.

[19]  Illés J. Farkas,et al.  CFinder: locating cliques and overlapping modules in biological networks , 2006, Bioinform..

[20]  Saeed Jalili,et al.  PCD-GED: Protein complex detection considering PPI dynamics based on time series gene expression data. , 2015, Journal of theoretical biology.

[21]  Young-Rae Cho,et al.  Entropy-Based Graph Clustering: Application to Biological and Social Networks , 2011, 2011 IEEE 11th International Conference on Data Mining.

[22]  Feng Luo,et al.  Modular organization of protein interaction networks , 2007, Bioinform..

[23]  A. Barabasi,et al.  Network biology: understanding the cell's functional organization , 2004, Nature Reviews Genetics.

[24]  Shouling Ji,et al.  Neighborhood-based uncertainty generation in social networks , 2014, J. Comb. Optim..

[25]  Yi Pan,et al.  Identifying protein complexes from interaction networks based on clique percolation and distance restriction , 2010, BMC Genomics.

[26]  WangJianxin,et al.  Detecting protein complexes based on uncertain graph model , 2014 .

[27]  Fang-Xiang Wu,et al.  Identifying protein complexes in protein–protein interaction networks by using clique seeds and graph entropy , 2013, Proteomics.

[28]  Kara Dolinski,et al.  The BioGRID Interaction Database: 2011 update , 2010, Nucleic Acids Res..

[29]  Livia Perfetto,et al.  MINT, the molecular interaction database: 2012 update , 2011, Nucleic Acids Res..

[30]  Jan Ramon,et al.  A new ensemble coevolution system for detecting HIV-1 protein coevolution , 2015, Biology Direct.

[31]  Shigehiko Kanaya,et al.  Development and implementation of an algorithm for detection of protein complexes in large interaction networks , 2006, BMC Bioinformatics.

[32]  Siu-Ming Yiu,et al.  Predicting Protein Complexes from PPI Data: A Core-Attachment Approach , 2009, J. Comput. Biol..

[33]  Rafael C. Jimenez,et al.  The IntAct molecular interaction database in 2012 , 2011, Nucleic Acids Res..

[34]  S. Dongen Graph clustering by flow simulation , 2000 .

[35]  László Szilágyi,et al.  A fast hierarchical clustering algorithm for large-scale protein sequence data sets , 2014, Comput. Biol. Medicine.

[36]  Gary D. Bader,et al.  An automated method for finding molecular complexes in large protein interaction networks , 2003, BMC Bioinformatics.

[37]  Guimei Liu,et al.  Supervised maximum-likelihood weighting of composite protein networks for complex prediction , 2012, BMC Systems Biology.

[38]  Yi Pan,et al.  Towards the identification of protein complexes and functional modules by integrating PPI network and gene expression data , 2012, BMC Bioinformatics.

[39]  WangJianxin,et al.  Identification of protein complexes using weighted pagerank-nibble algorithm and core-attachment structure , 2015 .

[40]  D. Eisenberg,et al.  Protein function in the post-genomic era , 2000, Nature.

[41]  T. Vicsek,et al.  Uncovering the overlapping community structure of complex networks in nature and society , 2005, Nature.

[42]  Christian Stolte,et al.  COMPARTMENTS: unification and visualization of protein subcellular localization evidence , 2014, Database J. Biol. Databases Curation.

[43]  Christie S. Chang,et al.  The BioGRID interaction database: 2013 update , 2012, Nucleic Acids Res..

[44]  S. Pu,et al.  Up-to-date catalogues of yeast protein complexes , 2008, Nucleic acids research.

[45]  Giulio Superti-Furga,et al.  Protein complexes and proteome organization from yeast to man. , 2003, Current opinion in chemical biology.

[46]  Maria Victoria Schneider,et al.  MINT: a Molecular INTeraction database. , 2002, FEBS letters.

[47]  Anton J. Enright,et al.  An efficient algorithm for large-scale detection of protein families. , 2002, Nucleic acids research.

[48]  Igor Jurisica,et al.  Functional topology in a network of protein interactions , 2004, Bioinform..

[49]  Fang-Xiang Wu,et al.  Detecting protein complexes from active protein interaction networks constructed with dynamic gene expression profiles , 2013, Proteome Science.

[50]  James I. Garrels,et al.  Yeast genomic databases and the challenge of the post-genomic era , 2002, Functional & Integrative Genomics.

[51]  Yi Pan,et al.  Detecting Protein Complexes Based on Uncertain Graph Model , 2014, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[52]  P. Bork,et al.  Proteome survey reveals modularity of the yeast cell machinery , 2006, Nature.

[53]  Limsoon Wong,et al.  Discovery of small protein complexes from PPI networks with size-specific supervised weighting , 2014, BMC Systems Biology.