Weighted edge based clustering to identify protein complexes in protein-protein interaction networks incorporating gene expression profile

Protein complex detection from protein-protein interaction (PPI) network has received a lot of focus in recent years. A number of methods identify protein complexes as dense sub-graphs using network information while several other methods detect protein complexes based on topological information. While the methods based on identifying dense sub-graphs are more effective in identifying protein complexes, not all protein complexes have high density. Moreover, existing methods focus more on static PPI networks and usually overlook the dynamic nature of protein complexes. Here, we propose a new method, Weighted Edge based Clustering (WEC), to identify protein complexes based on the weight of the edge between two interacting proteins, where the weight is defined by the edge clustering coefficient and the gene expression correlation between the interacting proteins. Our WEC method is capable of detecting highly inter-connected and co-expressed protein complexes. The experimental results of WEC on three real life data shows that our method can detect protein complexes effectively in comparison with other highly cited existing methods. AVAILABILITY The WEC tool is available at http://agnigarh.tezu.ernet.in/~rosy8/shared.html.

[1]  Aidong Zhang,et al.  Cluster analysis for gene expression data: a survey , 2004, IEEE Transactions on Knowledge and Data Engineering.

[2]  Catalin C. Barbacioru,et al.  The balance of reproducibility, sensitivity, and specificity of lists of differentially expressed genes in microarray studies , 2008, BMC Bioinformatics.

[3]  Hans-Werner Mewes,et al.  CORUM: the comprehensive resource of mammalian protein complexes , 2007, Nucleic Acids Res..

[4]  Yi Pan,et al.  Construction and application of dynamic protein interaction network based on time course gene expression data , 2013, Proteomics.

[5]  Gary D. Bader,et al.  An automated method for finding molecular complexes in large protein interaction networks , 2003, BMC Bioinformatics.

[6]  Yi Pan,et al.  Identification of Essential Proteins Based on Edge Clustering Coefficient , 2012, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[7]  Purvesh Khatri,et al.  Ontological analysis of gene expression data: current tools, limitations, and open problems , 2005, Bioinform..

[8]  Raivo Kolde,et al.  Estimating differential expression from multiple indicators , 2014, Nucleic acids research.

[9]  Claudio Castellano,et al.  Defining and identifying communities in networks. , 2003, Proceedings of the National Academy of Sciences of the United States of America.

[10]  P. Bork,et al.  Structure-Based Assembly of Protein Complexes in Yeast , 2004, Science.

[11]  Kara Dolinski,et al.  Saccharomyces Genome Database (SGD) provides secondary gene annotation using the Gene Ontology (GO) , 2002, Nucleic Acids Res..

[12]  P. Bork,et al.  Functional organization of the yeast proteome by systematic analysis of protein complexes , 2002, Nature.

[13]  Shigehiko Kanaya,et al.  Development and implementation of an algorithm for detection of protein complexes in large interaction networks , 2006, BMC Bioinformatics.

[14]  Xiaoli Li,et al.  Computational approaches for detecting protein complexes from protein interaction networks: a survey , 2010, BMC Genomics.

[15]  Samuel Kaski,et al.  Gene expression profiles in asbestos-exposed epithelial and mesothelial lung cell lines , 2007, BMC Genomics.

[16]  A. Kudlicki,et al.  Logic of the Yeast Metabolic Cycle: Temporal Compartmentalization of Cellular Processes , 2005, Science.

[17]  Igor Jurisica,et al.  Protein complex prediction via cost-based clustering , 2004, Bioinform..

[18]  Xiaomei Quan,et al.  Survey: Functional Module Detection from Protein-Protein Interaction Networks , 2014, IEEE Transactions on Knowledge and Data Engineering.

[19]  Mona Singh,et al.  Toward the dynamic interactome: it's about time , 2010, Briefings Bioinform..

[20]  Dmitrij Frishman,et al.  MIPS: analysis and annotation of proteins from whole genomes in 2005 , 2006, Nucleic Acids Res..

[21]  Jiangning Song,et al.  Using contrast patterns between true complexes and random subgraphs in PPI networks to predict unknown protein complexes , 2016, Scientific Reports.

[22]  Clara Pizzuti,et al.  Experimental evaluation of topological-based fitness functions to detect complexes in PPI networks , 2012, GECCO '12.

[23]  J. Cooper,et al.  Mapping of the Mouse Actin Capping Protein Beta Subunit Gene , 2000, BMC Genomics.

[24]  Peng Yang,et al.  Detecting temporal protein complexes from dynamic protein-protein interaction networks , 2014, BMC Bioinformatics.

[25]  E. O’Shea,et al.  Global analysis of protein localization in budding yeast , 2003, Nature.

[26]  P. Bork,et al.  Proteome survey reveals modularity of the yeast cell machinery , 2006, Nature.

[27]  Eugene V. Ryabov,et al.  MosaicSolver: a tool for determining recombinants of viral genomes from pileup data , 2014, Nucleic acids research.

[28]  Jugal Kalita,et al.  CLUSTERING GENE EXPRESSION DATA USING AN EFFECTIVE DISSIMILARITY MEASURE 1 , 2010 .

[29]  Yunlong Liu,et al.  2K09 and thereafter : the coming era of integrative bioinformatics, systems biology and intelligent computing for functional genomics and personalized medicine research , 2010, BMC Genomics.

[30]  Caroline C. Friedel,et al.  Bootstrapping the Interactome: Unsupervised Identification of Protein Complexes in Yeast , 2008, J. Comput. Biol..

[31]  R. Ozawa,et al.  A comprehensive two-hybrid analysis to explore the yeast protein interactome , 2001, Proceedings of the National Academy of Sciences of the United States of America.

[32]  Yi Pan,et al.  Identifying dynamic protein complexes based on gene expression profiles and PPI networks , 2013, BIBM.

[33]  Peng Jiang,et al.  SPICi: a fast clustering algorithm for large biological networks , 2010, Bioinform..

[34]  Kumaran Kandasamy,et al.  An evaluation of human protein-protein interaction data in the public domain , 2006, BMC Bioinformatics.

[35]  Gang Chen,et al.  Modifying the DPClus algorithm for identifying protein complexes based on new topological structures , 2008, BMC Bioinformatics.

[36]  Yongjin Park,et al.  How networks change with time , 2012, Bioinform..

[37]  Clara Pizzuti,et al.  A Coclustering Approach for Mining Large Protein-Protein Interaction Networks , 2012, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[38]  Haiyuan Yu,et al.  Detecting overlapping protein complexes in protein-protein interaction networks , 2012, Nature Methods.

[39]  Thomas Lengauer,et al.  A new measure for functional similarity of gene products based on Gene Ontology , 2006, BMC Bioinformatics.

[40]  Fang-Xiang Wu,et al.  Detecting protein complexes from active protein interaction networks constructed with dynamic gene expression profiles , 2013, Proteome Science.

[41]  A. Barabasi,et al.  Bioinformatics analysis of experimentally determined protein complexes in the yeast Saccharomyces cerevisiae. , 2003, Genome research.

[42]  Clara Pizzuti,et al.  Algorithms and tools for protein-protein interaction networks clustering, with a special focus on population-based stochastic methods , 2014, Bioinform..

[43]  Hanno Steen,et al.  Development of human protein reference database as an initial platform for approaching systems biology in humans. , 2003, Genome research.

[44]  Yi Pan,et al.  Predicting Essential Proteins Based on Weighted Degree Centrality , 2014, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[45]  R. Aebersold,et al.  Mass spectrometry-based proteomics , 2003, Nature.

[46]  Fang Wu,et al.  Detecting overlapping protein complexes in PPI networks based on robustness , 2013, Proteome Science.

[47]  Guimei Liu,et al.  Decomposing PPI networks for complex discovery , 2010, 2010 IEEE International Conference on Bioinformatics and Biomedicine (BIBM).

[48]  L. Castagnoli,et al.  mentha: a resource for browsing integrated protein-interaction networks , 2013, Nature Methods.

[49]  Min Wu,et al.  A core-attachment based method to detect protein complexes in PPI networks , 2009, BMC Bioinformatics.

[50]  Yi Pan,et al.  Identifying essential proteins via integration of protein interaction and gene expression data , 2012, 2012 IEEE International Conference on Bioinformatics and Biomedicine.

[51]  Fang-Xiang Wu,et al.  United Complex Centrality for Identification of Essential Proteins from PPI Networks , 2017, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[52]  Sean R. Collins,et al.  Toward a Comprehensive Atlas of the Physical Interactome of Saccharomyces cerevisiae*S , 2007, Molecular & Cellular Proteomics.

[53]  S. Schreiber,et al.  Printing proteins as microarrays for high-throughput function determination. , 2000, Science.

[54]  F. Zare-Mirakabad,et al.  WCOACH: Protein complex prediction in weighted PPI networks. , 2015, Genes & genetic systems.

[55]  Philip S. Yu,et al.  A new method to measure the semantic similarity of GO terms , 2007, Bioinform..

[56]  Siu-Ming Yiu,et al.  Predicting Protein Complexes from PPI Data: A Core-Attachment Approach , 2009, J. Comput. Biol..

[57]  S. Pu,et al.  Up-to-date catalogues of yeast protein complexes , 2008, Nucleic acids research.