Functional module identification by block modeling using simulated annealing with path relinking

Identifying functional modules and understanding their organization in biological networks is of great importance. Recently, module identification by block modeling has demonstrated its advantages over the existing algorithms only considering topologically "cohesive" modules. In this paper, we aim to identify biologically meaningful functional modules by not only considering topologically "cohesive" modules but also taking into account the modules with nodes sparsely connected but sharing similar interaction patterns. In our adopted block modeling framework, we propose a novel efficient optimization algorithm by combining Simulated Annealing (SA) and Path Relinking (PR) to solve this difficult combinatorial optimization problem. We have evaluated the performance of our algorithm on a set of synthetic benchmark networks and a human protein-protein interaction (PPI) network. Our results show that our new SAPR algorithm achieves higher accuracy than existing state-of-the-art algorithms. The new algorithm also has significantly reduced computation time compared to the traditional SA algorithm with competitive accuracy. Preliminary results for identifying functional modules in the human PPI network and the comparison with the commonly adopted Markov Clustering (MCL) algorithm have demonstrated the potential of our algorithm to discover new types of modules, within which proteins are sparsely connected but with significantly enriched biological functionalities.

[1]  Leon Danon,et al.  Comparing community structure identification , 2005, cond-mat/0505245.

[2]  Stefan Bornholdt,et al.  Structure in Networks , 2010 .

[3]  S. Dongen A cluster algorithm for graphs , 2000 .

[4]  Mauricio G. C. Resende,et al.  GRASP with path-relinking for the generalized quadratic assignment problem , 2011, J. Heuristics.

[5]  Ricard V Solé,et al.  Topology, tinkering and evolution of the human transcription factor network , 2005, The FEBS journal.

[6]  Sandhya Rani,et al.  Human Protein Reference Database—2009 update , 2008, Nucleic Acids Res..

[7]  Martin Rosvall,et al.  An information-theoretic framework for resolving community structure in complex networks , 2007, Proceedings of the National Academy of Sciences.

[8]  J. V. Moran,et al.  Initial sequencing and analysis of the human genome. , 2001, Nature.

[9]  Jörg Schultz,et al.  Protein Interaction Networks—More Than Mere Modules , 2008, PLoS Comput. Biol..

[10]  David Botstein,et al.  GO: : TermFinder--open source software for accessing Gene Ontology information and finding significantly enriched Gene Ontology terms associated with a list of genes , 2004, Bioinform..

[11]  E A Leicht,et al.  Mixture models and exploratory analysis in networks , 2006, Proceedings of the National Academy of Sciences.

[12]  International Human Genome Sequencing Consortium Initial sequencing and analysis of the human genome , 2001, Nature.

[13]  Marcus Kaiser Mean clustering coefficients: the role of isolated nodes and leafs on clustering measures for small-world networks , 2008, 0802.2512.

[14]  T. Vicsek,et al.  Uncovering the overlapping community structure of complex networks in nature and society , 2005, Nature.

[15]  F. Glover,et al.  Fundamentals of Scatter Search and Path Relinking , 2000 .

[16]  Douglas R. White,et al.  Role models for complex networks , 2007, 0708.0958.

[17]  M E J Newman,et al.  Finding and evaluating community structure in networks. , 2003, Physical review. E, Statistical, nonlinear, and soft matter physics.

[18]  Stijn van Dongen,et al.  GeneMCL in microarray analysis , 2005, Comput. Biol. Chem..

[19]  Ming Wu,et al.  Gene module level analysis: identification to networks and dynamics. , 2008, Current opinion in biotechnology.

[20]  Jonathan F. Bard,et al.  A reactive GRASP with path relinking for capacitated clustering , 2011, J. Heuristics.