Clustering complex networks and biological networks by nonnegative matrix factorization with various similarity measures

Identifying community structure in complex networks is closely related to clustering of data in other areas without an underlying network structure. In this paper, we propose a nonnegative matrix factorization (NMF)-based method for finding community structure. We first evaluate several similarity measures, such as diffusion kernel similarity, shortest path based similarity on several widely well-studied networks. Then, we apply NMF with diffusion kernel similarity to a large biological network, which demonstrates that our method can find biologically meaningful functional modules. Comparison with other algorithms also indicates the good performance of our method.

[1]  Jon M. Kleinberg,et al.  Inferring Web communities from link topology , 1998, HYPERTEXT '98.

[2]  Shi-Hua Zhang,et al.  A Graph-Theoretic Method for Mining Functional Modules in Large Sparse Protein Interaction Networks , 2006, Sixth IEEE International Conference on Data Mining - Workshops (ICDMW'06).

[3]  Mark Newman,et al.  Detecting community structure in networks , 2004 .

[4]  P. Bork,et al.  Proteome survey reveals modularity of the yeast cell machinery , 2006, Nature.

[5]  M E J Newman,et al.  Modularity and community structure in networks. , 2006, Proceedings of the National Academy of Sciences of the United States of America.

[6]  Mika Gustafsson,et al.  Comparison and validation of community structures in complex networks , 2006 .

[7]  H. Sebastian Seung,et al.  Algorithms for Non-negative Matrix Factorization , 2000, NIPS.

[8]  A. Arenas,et al.  Community detection in complex networks using extremal optimization. , 2005, Physical review. E, Statistical, nonlinear, and soft matter physics.

[9]  M E J Newman,et al.  Community structure in social and biological networks , 2001, Proceedings of the National Academy of Sciences of the United States of America.

[10]  Pablo Tamayo,et al.  Metagenes and molecular pattern discovery using matrix factorization , 2004, Proceedings of the National Academy of Sciences of the United States of America.

[11]  H. Sebastian Seung,et al.  Learning the parts of objects by non-negative matrix factorization , 1999, Nature.

[12]  Tamás Vicsek,et al.  Phase transitions and overlapping modules in complex networks , 2007 .

[13]  M E J Newman,et al.  Fast algorithm for detecting community structure in networks. , 2003, Physical review. E, Statistical, nonlinear, and soft matter physics.

[14]  Leon Danon,et al.  Comparing community structure identification , 2005, cond-mat/0505245.

[15]  Shi-Hua Zhang,et al.  Identification of functional modules in a PPI network by clique percolation clustering , 2006, Comput. Biol. Chem..

[16]  Javier Béjar,et al.  Clustering algorithm for determining community structure in large networks. , 2006, Physical review. E, Statistical, nonlinear, and soft matter physics.

[17]  Dmitrij Frishman,et al.  MIPS: a database for genomes and protein sequences , 2000, Nucleic Acids Res..

[18]  F. Rao,et al.  Local modularity measure for network clusterizations. , 2005, Physical review. E, Statistical, nonlinear, and soft matter physics.

[19]  John D. Lafferty,et al.  Diffusion Kernels on Graphs and Other Discrete Input Spaces , 2002, ICML.

[20]  M E J Newman,et al.  Finding and evaluating community structure in networks. , 2003, Physical review. E, Statistical, nonlinear, and soft matter physics.

[21]  D. Bu,et al.  the protein–protein interaction network , 2004 .

[22]  Luonan Chen,et al.  Discovering functions and revealing mechanisms at molecular level from biological networks , 2007, Proteomics.

[23]  Stanley Wasserman,et al.  Social Network Analysis: Methods and Applications , 1994, Structural analysis in the social sciences.

[24]  Yishan Jiao,et al.  Faster and more accurate global protein function assignment from protein interaction networks using the MFGO algorithm , 2006, FEBS letters.

[25]  W. Wong,et al.  Transitive functional annotation by shortest-path analysis of gene expression data , 2002, Proceedings of the National Academy of Sciences of the United States of America.

[26]  T. Vicsek,et al.  Uncovering the overlapping community structure of complex networks in nature and society , 2005, Nature.

[27]  M. Samanta,et al.  Predicting protein functions from redundancies in large-scale protein interaction networks , 2003, Proceedings of the National Academy of Sciences of the United States of America.