Finding Functional Modules

Biological systems can be modeled as complex network systems with many interactions between the components. These interactions give rise to the function and behavior of that system. For example, the protein–protein interaction (PPI) network is the physical basis of multiple cellular functions. One goal of emerging systems biology is to analyze very large complex biological networks such as protein–protein interaction networks, metabolic networks, and regulatory networks to identify functional modules and assign functions to certain components of the system. Network modules do not occur by chance, so identification of modules is likely to capture the biologically meaningful interactions in large-scale PPI data. Unfortunately, existing computer-based clustering methods developed to find those modules are either not so accurate or are too slow. We devised a new methodology called SCAN (Structural Clustering Algorithm for Networks) that can efficiently find clusters or functional modules in complex biological networks as well as hubs and outliers. More specifically, we demonstrated that we can find functional modules in complex networks and classify nodes into various roles based on their structures. In this chapter, we show the effectiveness of our methodology using the budding yeast (Saccharomyces cerevisiae) PPI network. To validate our clustering results, we compared our clusters with the known functions of each protein. Our predicted functional modules achieved very high purity comparing with state-of-the-art approaches. Additionally the theoretical and empirical analysis demonstrated a linear running time of the algorithm, which is the fastest approach for networks. We compare our algorithm with well-known modularity-based clustering algorithm CNM. We successfully detect functional groups that are annotated with putative Gene Ontology (GO) terms. Top-10 clusters with minimum p-value theoretically prove that newly proposed algorithm partitions network more accurately then CNM. Furthermore, manual interpretations of functional groups found by SCAN show superior performance over CNM.

[1]  A. Barabasi,et al.  Network biology: understanding the cell's functional organization , 2004, Nature Reviews Genetics.

[2]  S. Strogatz Exploring complex networks , 2001, Nature.

[3]  Hans-Peter Kriegel,et al.  A Density-Based Algorithm for Discovering Clusters in Large Spatial Databases with Noise , 1996, KDD.

[4]  Igor Jurisica,et al.  Protein complex prediction via cost-based clustering , 2004, Bioinform..

[5]  A. Barabasi,et al.  Functional and topological characterization of protein interaction networks , 2004, Proteomics.

[6]  M. Newman,et al.  Finding community structure in very large networks. , 2004, Physical review. E, Statistical, nonlinear, and soft matter physics.

[7]  Lani F. Wu,et al.  Large-scale prediction of Saccharomyces cerevisiae gene function using overlapping transcriptional clusters , 2002, Nature Genetics.

[8]  S. L. Wong,et al.  A Map of the Interactome Network of the Metazoan C. elegans , 2004, Science.

[9]  Chris H. Q. Ding,et al.  A min-max cut algorithm for graph partitioning and data clustering , 2001, Proceedings 2001 IEEE International Conference on Data Mining.

[10]  Xiaowei Xu,et al.  SCAN: a structural clustering algorithm for networks , 2007, KDD '07.

[11]  Ignacio Marín,et al.  Iterative Cluster Analysis of Protein Interaction Data , 2005, Bioinform..

[12]  R. Guimerà,et al.  Functional cartography of complex metabolic networks , 2005, Nature.

[13]  Srinivasan Parthasarathy,et al.  Improving Functional Modularity in Protein-Protein Interactions Graphs Using Hub-Induced Subgraphs , 2006, PKDD.

[14]  James R. Knight,et al.  A Protein Interaction Map of Drosophila melanogaster , 2003, Science.

[15]  D. Fell,et al.  The small world inside large metabolic networks , 2000, Proceedings of the Royal Society of London. Series B: Biological Sciences.

[16]  Duncan J. Watts,et al.  Collective dynamics of ‘small-world’ networks , 1998, Nature.

[17]  R. Ozawa,et al.  A comprehensive two-hybrid analysis to explore the yeast protein interactome , 2001, Proceedings of the National Academy of Sciences of the United States of America.

[18]  Michalis Faloutsos,et al.  On power-law relationships of the Internet topology , 1999, SIGCOMM '99.

[19]  Igor Jurisica,et al.  Functional topology in a network of protein interactions , 2004, Bioinform..

[20]  Jitendra Malik,et al.  Normalized Cuts and Image Segmentation , 2000, IEEE Trans. Pattern Anal. Mach. Intell..

[21]  R. Albert,et al.  The large-scale organization of metabolic networks , 2000, Nature.

[22]  M E J Newman,et al.  Finding and evaluating community structure in networks. , 2003, Physical review. E, Statistical, nonlinear, and soft matter physics.

[23]  James R. Knight,et al.  A comprehensive analysis of protein–protein interactions in Saccharomyces cerevisiae , 2000, Nature.