Mining hub-based protein complexes in massive biological networks

Advanced technologies are producing large-scale protein-protein interaction data at an ever increasing pace. Finding protein-protein interaction complexes from large PPI networks is a fundamental problem in bioinformatics. As a group of core proteins which interacts with other more proteins, hub proteins play a key role in protein complex and life activity. In this paper, we propose a novel topological model, HP*-complex, which defines the hub proteins of protein complex and extends to encompass the neighborhood of the hub proteins, for the initial structure of protein complexes. An algorithm based on the new topological model, called HPCMiner, is developed for identifying protein complexes from large PPI networks. The experiment results on real dataset show that our proposed algorithm detects many complexes having special biological significance. The results from a study on synthetic data sets demonstrate that the HPCMiner algorithm scales well with respect to data set size.

[1]  Yi Pan,et al.  Identifying protein complexes from interaction networks based on clique percolation and distance restriction , 2010, BMC Genomics.

[2]  John J. Grefenstette,et al.  Application of machine learning in SNP discovery , 2006, BMC Bioinformatics.

[3]  Martin Kuiper,et al.  BiNGO: a Cytoscape plugin to assess overrepresentation of Gene Ontology categories in Biological Networks , 2005, Bioinform..

[4]  Igor Jurisica,et al.  Protein complex prediction via cost-based clustering , 2004, Bioinform..

[5]  Antonio del Sol,et al.  Topology of small-world networks of protein?Cprotein complex structures , 2005, Bioinform..

[6]  A. del Sol,et al.  Small‐world network approach to identify key residues in protein–protein interaction , 2004, Proteins.

[7]  Shigehiko Kanaya,et al.  Development and implementation of an algorithm for detection of protein complexes in large interaction networks , 2006, BMC Bioinformatics.

[8]  Hedi Peterson,et al.  GraphWeb: mining heterogeneous biological networks for gene modules with functional significance , 2008, Nucleic Acids Res..

[9]  P. Bork,et al.  Proteome survey reveals modularity of the yeast cell machinery , 2006, Nature.

[10]  L. Mirny,et al.  Protein complexes and functional modules in molecular networks , 2003, Proceedings of the National Academy of Sciences of the United States of America.

[11]  Shi-Hua Zhang,et al.  Prediction of Protein Complexes Based on Protein Interaction Data and Functional Annotation Data Using Kernel Methods , 2006, ICIC.

[12]  Marc Vidal,et al.  Predictive models of molecular machines involved in Caenorhabditis elegans early embryogenesis , 2005, Nature.

[13]  Jacques van Helden,et al.  Evaluation of clustering algorithms for protein-protein interaction networks , 2006, BMC Bioinformatics.

[14]  Bing Zhang,et al.  GOTree Machine (GOTM): a web-based platform for interpreting sets of interesting genes using Gene Ontology hierarchies , 2004, BMC Bioinformatics.

[15]  Siu-Ming Yiu,et al.  Predicting Protein Complexes from PPI Data: A Core-Attachment Approach , 2009, J. Comput. Biol..

[16]  Philip S. Yu,et al.  A new method to measure the semantic similarity of GO terms , 2007, Bioinform..

[17]  Gary D Bader,et al.  A Combined Experimental and Computational Strategy to Define Protein Interaction Networks for Peptide Recognition Modules , 2001, Science.

[18]  Jeffrey Xu Yu,et al.  Finding maximal cliques in massive networks by H*-graph , 2010, SIGMOD Conference.

[19]  Gary D. Bader,et al.  An automated method for finding molecular complexes in large protein interaction networks , 2003, BMC Bioinformatics.

[20]  Philip S. Yu,et al.  G-SESAME: web tools for GO-term-based gene similarity analysis and knowledge discovery , 2009, Nucleic Acids Res..

[21]  Andreas Wagner,et al.  A statistical framework for combining and interpreting proteomic datasets , 2004, Bioinform..

[22]  Anton J. Enright,et al.  An efficient algorithm for large-scale detection of protein families. , 2002, Nucleic acids research.

[23]  Bill C White,et al.  Optimization of neural network architecture using genetic programming improves detection and modeling of gene-gene interactions in studies of human diseases , 2003, BMC Bioinformatics.

[24]  Min Wu,et al.  A core-attachment based method to detect protein complexes in PPI networks , 2009, BMC Bioinformatics.

[25]  James I. Garrels,et al.  Yeast genomic databases and the challenge of the post-genomic era , 2002, Functional & Integrative Genomics.

[26]  T. Vicsek,et al.  Uncovering the overlapping community structure of complex networks in nature and society , 2005, Nature.

[27]  Alexander Schliep,et al.  Identifying protein complexes directly from high-throughput TAP data with Markov random fields , 2007, BMC Bioinformatics.

[28]  Stéphane Vialette,et al.  MiCoViTo: a tool for gene-centric comparison and visualization of yeast transcriptome states , 2004, BMC Bioinformatics.

[29]  Anna V. Vlasova,et al.  preAssemble: a tool for automatic sequencer trace data processing , 2005, BMC Bioinformatics.

[30]  See-Kiong Ng,et al.  Interaction graph mining for protein complexes using local clique merging. , 2005, Genome informatics. International Conference on Genome Informatics.

[31]  Paul P. Gardner,et al.  Multiple alignment and structure prediction of non-coding RNA sequences , 2007, BMC Bioinformatics.

[32]  Nagiza F. Samatova,et al.  From pull-down data to protein interaction networks and complexes with biological relevance. , 2008, Bioinformatics.

[33]  Dong-Soo Han,et al.  PreSPI: a domain combination based prediction system for protein-protein interaction. , 2004, Nucleic acids research.

[34]  R. Carter 11 – IT and society , 1991 .

[35]  Gang Chen,et al.  Modifying the DPClus algorithm for identifying protein complexes based on new topological structures , 2008, BMC Bioinformatics.

[36]  Haibo Wang,et al.  iGepros: an integrated gene and protein annotation server for biological nature exploration , 2011, BMC Bioinformatics.