Detecting Protein Complexes in Protein Interaction Networks Modeled as Gene Expression Biclusters

Developing suitable methods for the detection of protein complexes in protein interaction networks continues to be an intriguing area of research. The importance of this objective originates from the fact that protein complexes are key players in most cellular processes. The more complexes we identify, the better we can understand normal as well as abnormal molecular events. Up till now, various computational methods were designed for this purpose. However, despite their notable performance, questions arise regarding potential ways to improve them, in addition to ameliorative guidelines to introduce novel approaches. A close interpretation leads to the assent that the way in which protein interaction networks are initially viewed should be adjusted. These networks are dynamic in reality and it is necessary to consider this fact to enhance the detection of protein complexes. In this paper, we present “DyCluster”, a framework to model the dynamic aspect of protein interaction networks by incorporating gene expression data, through biclustering techniques, prior to applying complex-detection algorithms. The experimental results show that DyCluster leads to higher numbers of correctly-detected complexes with better evaluation scores. The high accuracy achieved by DyCluster in detecting protein complexes is a valid argument in favor of the proposed method. DyCluster is also able to detect biologically meaningful protein groups. The code and datasets used in the study are downloadable from https://github.com/emhanna/DyCluster.

[1]  Guimei Liu,et al.  Complex discovery from weighted PPI networks , 2009, Bioinform..

[2]  M. Ashburner,et al.  Gene Ontology: tool for the unification of biology , 2000, Nature Genetics.

[3]  Limsoon Wong,et al.  Exploiting indirect neighbours and topological weight to predict protein function from protein--protein interactions , 2006 .

[4]  K. Bajbouj,et al.  Saffron: A potential candidate for a novel anticancer drug against hepatocellular carcinoma , 2011, Hepatology.

[5]  Sampsa Hautaniemi,et al.  Biclustering Methods: Biological Relevance and Application in Gene Expression Analysis , 2014, PloS one.

[6]  Bo Xu,et al.  Ontology integration to identify protein complex in protein interaction networks , 2010, 2010 IEEE International Conference on Bioinformatics and Biomedicine (BIBM).

[7]  Nazar Zaki,et al.  A comparative analysis of computational approaches and algorithms for protein subcomplex identification , 2014, Scientific Reports.

[8]  L. Mirny,et al.  Protein complexes and functional modules in molecular networks , 2003, Proceedings of the National Academy of Sciences of the United States of America.

[9]  Jyoti S Choudhary,et al.  Mapping multiprotein complexes by affinity purification and mass spectrometry. , 2008, Current opinion in biotechnology.

[10]  J. A. Hartigan,et al.  A k-means clustering algorithm , 1979 .

[11]  Aidong Zhang,et al.  Cluster analysis for gene expression data: a survey , 2004, IEEE Transactions on Knowledge and Data Engineering.

[12]  S. Fields,et al.  A novel genetic system to detect protein–protein interactions , 1989, Nature.

[13]  Igor Jurisica,et al.  Protein complex prediction via cost-based clustering , 2004, Bioinform..

[14]  Nazar Zaki,et al.  ProRank: a method for detecting protein complexes , 2012, GECCO '12.

[15]  Hwee Tong Tan,et al.  Subcellular fractionation methods and strategies for proteomics , 2010, Proteomics.

[16]  Limsoon Wong,et al.  Using Indirect protein-protein Interactions for protein Complex Prediction , 2008, J. Bioinform. Comput. Biol..

[17]  Haiyuan Yu,et al.  Detecting overlapping protein complexes in protein-protein interaction networks , 2012, Nature Methods.

[18]  Musa H. Asyali,et al.  Gene Expression Profile Classification: A Review , 2006 .

[19]  Yi Pan,et al.  Identifying Dynamic Protein Complexes Based on Gene Expression Profiles and PPI Networks , 2013, 2013 IEEE International Conference on Bioinformatics and Biomedicine.

[20]  George M. Church,et al.  Biclustering of Expression Data , 2000, ISMB.

[21]  Sven Bergmann,et al.  Iterative signature algorithm for the analysis of large-scale gene expression data. , 2002, Physical review. E, Statistical, nonlinear, and soft matter physics.

[22]  Tae Hoon Kim,et al.  Genome-wide analysis of protein-DNA interactions. , 2006, Annual review of genomics and human genetics.

[23]  Illés J. Farkas,et al.  CFinder: locating cliques and overlapping modules in biological networks , 2006, Bioinform..

[24]  Panos M. Pardalos,et al.  Biclustering in data mining , 2008, Comput. Oper. Res..

[25]  Sean R. Davis,et al.  NCBI GEO: archive for functional genomics data sets—update , 2012, Nucleic Acids Res..

[26]  Michal Linial,et al.  Using Bayesian Networks to Analyze Expression Data , 2000, J. Comput. Biol..

[27]  D Remondini,et al.  Targeting c-Myc-activated genes with a correlation method: detection of global changes in large gene expression network dynamics. , 2005, Proceedings of the National Academy of Sciences of the United States of America.

[28]  Yi Pan,et al.  Active Protein Interaction Network and Its Application on Protein Complex Detection , 2011, 2011 IEEE International Conference on Bioinformatics and Biomedicine.

[29]  A. Kudlicki,et al.  Logic of the Yeast Metabolic Cycle: Temporal Compartmentalization of Cellular Processes , 2005, Science.

[30]  Emmanuel D Levy,et al.  Evolution and dynamics of protein interactions and networks. , 2008, Current opinion in structural biology.

[31]  Delbert Dueck,et al.  Clustering by Passing Messages Between Data Points , 2007, Science.

[32]  Seungjin Choi,et al.  Inference of dynamic networks using time-course data , 2014, Briefings Bioinform..

[33]  Yi Pan,et al.  An effective method for refining predicted protein complexes based on protein activity and the mechanism of protein complex formation , 2013, BMC Systems Biology.

[34]  B. Commoner Is DNA the “secret of life”? , 1965, Clinical pharmacology and therapeutics.

[35]  Le Song,et al.  KELLER: estimating time-varying interactions between genes , 2009, Bioinform..

[36]  Arlindo L. Oliveira,et al.  Biclustering algorithms for biological data analysis: a survey , 2004, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[37]  Eckart Zitzler,et al.  BicAT: a biclustering analysis toolbox , 2006, Bioinform..

[38]  S. Pu,et al.  Up-to-date catalogues of yeast protein complexes , 2008, Nucleic acids research.

[39]  Bernhard Hiller,et al.  CyDAS: a cytogenetic data analysis system , 2005, Bioinform..

[40]  Limsoon Wong,et al.  Exploiting Indirect Neighbours and Topological Weight to Predict Protein Function from Protein-Protein Interactions , 2006, BioDM.

[41]  Igor Jurisica,et al.  Functional topology in a network of protein interactions , 2004, Bioinform..

[42]  N. Zaki,et al.  Detection of protein complexes using a protein ranking algorithm , 2012, Proteins.

[43]  Ujjwal Maulik,et al.  A Novel Coherence Measure for Discovering Scaling Biclusters from Gene Expression Data , 2009, J. Bioinform. Comput. Biol..

[44]  Pierre Baldi,et al.  DNA Microarrays and Gene Expression - From Experiments to Data Analysis and Modeling , 2002 .

[45]  Guillaume Adelmant,et al.  Protein complexes: the forest and the trees , 2009, Expert review of proteomics.

[46]  S. Dongen Graph clustering by flow simulation , 2000 .

[47]  Gary D. Bader,et al.  An automated method for finding molecular complexes in large protein interaction networks , 2003, BMC Bioinformatics.

[48]  D. di Bernardo,et al.  How to infer gene networks from expression profiles , 2007, Molecular systems biology.

[49]  Christian von Mering,et al.  STRING 8—a global view on proteins and their functional interactions in 630 organisms , 2008, Nucleic Acids Res..

[50]  Limsoon Wong,et al.  Using indirect protein-protein interactions for protein complex predication. , 2007, Computational systems bioinformatics. Computational Systems Bioinformatics Conference.

[51]  Zelmina Lubovac,et al.  Combining functional and topological properties to identify core modules in protein interaction networks , 2006, Proteins.

[52]  David A. Orlando,et al.  Revisiting Global Gene Expression Analysis , 2012, Cell.

[53]  Richard M. Karp,et al.  Discovering local structure in gene expression data: the order-preserving submatrix problem , 2002, RECOMB '02.

[54]  Ümit V. Çatalyürek,et al.  Comparative analysis of biclustering algorithms , 2010, BCB '10.

[55]  P. Bork,et al.  Dynamic Complex Formation During the Yeast Cell Cycle , 2005, Science.

[56]  Ioannis Xenarios,et al.  DIP, the Database of Interacting Proteins: a research tool for studying cellular networks of protein interactions , 2002, Nucleic Acids Res..

[57]  Mona Singh,et al.  Toward the dynamic interactome: it's about time , 2010, Briefings Bioinform..

[58]  Nazar Zaki,et al.  Protein complex detection using interaction reliability assessment and weighted clustering coefficient , 2013, BMC Bioinformatics.

[59]  Peter K. Sorger,et al.  Logic-Based Models for the Analysis of Cell Signaling Networks† , 2010, Biochemistry.

[60]  P. Bork,et al.  Proteome survey reveals modularity of the yeast cell machinery , 2006, Nature.

[61]  A. Mortazavi,et al.  Genome-Wide Mapping of in Vivo Protein-DNA Interactions , 2007, Science.

[62]  J. Hodgkin,et al.  Seven types of pleiotropy. , 1998, The International journal of developmental biology.

[63]  Nazar Zaki,et al.  ProRank + : A Method for Detecting Protein Complexes in Protein Interaction Networks , 2013 .