An MDL approach to efficiently discover communities in bipartite network

An minimum description length (MDL) criterion is proposed to choose a good partition for a bipartite network. A heuristic algorithm based on combination theory is presented to approach the optimal partition. As the heuristic algorithm automatically searches for the number of partitions, no user intervention is required. Finally, experiments are conducted on various datasets, and the results show that our method generates higher quality results than the state-of-art methods, cross-association and bipartite, recursively induced modules. Experiment results also show the good scalability of the proposed algorithm. The method is applied to traditional Chinese medicine (TCM) formula and Chinese herbal network whose community structure is not well known, and found that it detects significant and it is informative community division.

[1]  Martin Rosvall,et al.  An information-theoretic framework for resolving community structure in complex networks , 2007, Proceedings of the National Academy of Sciences.

[2]  Greg Linden,et al.  Amazon . com Recommendations Item-to-Item Collaborative Filtering , 2001 .

[3]  Tom A. B. Snijders,et al.  DYNAMIC SOCIAL NETWORK MODELING AND ANALYSIS , 2003 .

[4]  Arlindo L. Oliveira,et al.  Biclustering algorithms for biological data analysis: a survey , 2004, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[5]  Jorma Rissanen,et al.  The Minimum Description Length Principle in Coding and Modeling , 1998, IEEE Trans. Inf. Theory.

[6]  S. Lehmann,et al.  Biclique communities. , 2007, Physical review. E, Statistical, nonlinear, and soft matter physics.

[7]  Roger Guimerà,et al.  Module identification in bipartite and directed networks. , 2007, Physical review. E, Statistical, nonlinear, and soft matter physics.

[8]  Lucas Antiqueira,et al.  A complex network approach to text summarization , 2009, Inf. Sci..

[9]  M. Barber Modularity and community detection in bipartite networks. , 2007, Physical review. E, Statistical, nonlinear, and soft matter physics.

[10]  S. Fortunato,et al.  Resolution limit in community detection , 2006, Proceedings of the National Academy of Sciences.

[11]  Martin Rosvall,et al.  Maps of random walks on complex networks reveal community structure , 2007, Proceedings of the National Academy of Sciences.

[12]  Kenneth H. Rosen,et al.  Discrete Mathematics and its applications , 2000 .

[13]  Samantha Jenkins,et al.  Software architecture graphs as complex networks: A novel partitioning scheme to measure stability and evolution , 2007, Inf. Sci..

[14]  Inderjit S. Dhillon,et al.  Information-theoretic co-clustering , 2003, KDD '03.

[15]  Ricardo J. G. B. Campello,et al.  Evolving clusters in gene-expression data , 2006, Inf. Sci..

[16]  Jun Zhu,et al.  A novel method for real parameter optimization based on Gene Expression Programming , 2009, Appl. Soft Comput..

[17]  Mark E. J. Newman,et al.  The Structure and Function of Complex Networks , 2003, SIAM Rev..

[18]  S. Strogatz Exploring complex networks , 2001, Nature.

[19]  M E J Newman,et al.  Modularity and community structure in networks. , 2006, Proceedings of the National Academy of Sciences of the United States of America.

[20]  Christos Faloutsos,et al.  Fully automatic cross-associations , 2004, KDD.

[21]  Leon Danon,et al.  Comparing community structure identification , 2005, cond-mat/0505245.

[22]  R. Guimerà,et al.  Functional cartography of complex metabolic networks , 2005, Nature.

[23]  Michael Sipser,et al.  Introduction to the Theory of Computation , 1996, SIGA.

[24]  Javier Béjar,et al.  Clustering algorithm for determining community structure in large networks. , 2006, Physical review. E, Statistical, nonlinear, and soft matter physics.

[25]  M E J Newman,et al.  Finding and evaluating community structure in networks. , 2003, Physical review. E, Statistical, nonlinear, and soft matter physics.

[26]  Philip S. Yu,et al.  Hierarchical, Parameter-Free Community Discovery , 2008, ECML/PKDD.

[27]  Daniel A. Ashlock,et al.  Evolutionary computation for modeling and optimization , 2005 .

[28]  Eugenio Cesario,et al.  Random walk biclustering for microarray data , 2008, Inf. Sci..

[29]  Kai-Yuan Cai,et al.  Software execution processes as an evolving complex network , 2009, Inf. Sci..

[30]  George M. Church,et al.  Biclustering of Expression Data , 2000, ISMB.

[31]  Philip S. Yu,et al.  GraphScope: parameter-free mining of large time-evolving graphs , 2007, KDD '07.

[32]  M E J Newman,et al.  Community structure in social and biological networks , 2001, Proceedings of the National Academy of Sciences of the United States of America.