Exploring triad-rich substructures by graph-theoretic characterizations in complex networks

One of the most important problems in complex networks is how to detect communities accurately. The main challenge lies in the fact that traditional definition about communities does not always capture the intrinsic features of communities. Motivated by the observation that communities in PPI networks tend to consist of an abundance of interacting triad motifs, we define a 2-club substructure with diameter 2 possessing triad-rich property to describe a community. Based on the triad-rich substructure, we design a DIVision Algorithm using our proposed edge Niche Centrality DIVANC to detect communities effectively in complex networks. We also extend DIVANC to detect overlapping communities by proposing a simple 2-hop overlapping strategy. To verify the effectiveness of triad-rich substructures, we compare DIVANC with existing algorithms on PPI networks, LFR synthetic networks and football networks. The experimental results show that DIVANC outperforms most other algorithms significantly and, in particular, can detect sparse communities.

[1]  Pavel Tomancak,et al.  linkcomm: an R package for the generation, visualization, and analysis of link communities in networks of arbitrary size and type , 2011, Bioinform..

[2]  Sune Lehmann,et al.  Link communities reveal multiscale complexity in networks , 2009, Nature.

[3]  Katsuhiko Murakami,et al.  PCDq: human protein complex database with quality index which summarizes different levels of evidences of protein complexes predicted from H-Invitational protein-protein interactions integrative dataset , 2012, BMC Systems Biology.

[4]  Sergio Gómez,et al.  Detecting communities of triangles in complex networks using spectral optimization , 2010, Comput. Commun..

[5]  William W. Cohen,et al.  Block-LDA: Jointly Modeling Entity-Annotated Text and Entity-Entity Links , 2014, Handbook of Mixed Membership Models and Their Applications.

[6]  M. Small,et al.  Seeding the Kernels in graphs: toward multi-resolution community analysis , 2009 .

[7]  Filippo Radicchi,et al.  A paradox in community detection , 2013, ArXiv.

[8]  Gary D. Bader,et al.  An automated method for finding molecular complexes in large protein interaction networks , 2003, BMC Bioinformatics.

[9]  Claudio Castellano,et al.  Defining and identifying communities in networks. , 2003, Proceedings of the National Academy of Sciences of the United States of America.

[10]  Francesco Bonchi,et al.  Description-Driven Community Detection , 2014, TIST.

[11]  Derek G. Corneil,et al.  Complement reducible graphs , 1981, Discret. Appl. Math..

[12]  M E J Newman,et al.  Community structure in social and biological networks , 2001, Proceedings of the National Academy of Sciences of the United States of America.

[13]  Min Wu,et al.  A core-attachment based method to detect protein complexes in PPI networks , 2009, BMC Bioinformatics.

[14]  William W. Cohen,et al.  Block-LDA: Jointly Modeling Entity-Annotated Text and Entity-Entity Links , 2014, Handbook of Mixed Membership Models and Their Applications.

[15]  T. Vicsek,et al.  Uncovering the overlapping community structure of complex networks in nature and society , 2005, Nature.

[16]  Derek Greene,et al.  Normalized Mutual Information to evaluate overlapping community finding algorithms , 2011, ArXiv.

[17]  Jure Leskovec,et al.  Defining and Evaluating Network Communities Based on Ground-Truth , 2012, ICDM.

[18]  Kay Hofmann,et al.  The yeast GID complex, a novel ubiquitin ligase (E3) involved in the regulation of carbohydrate metabolism. , 2008, Molecular biology of the cell.

[19]  Andrea Lancichinetti,et al.  Detecting the overlapping and hierarchical community structure in complex networks , 2008, 0802.1218.

[20]  Anton J. Enright,et al.  An efficient algorithm for large-scale detection of protein families. , 2002, Nucleic acids research.

[21]  Santo Fortunato,et al.  Community detection in graphs , 2009, ArXiv.

[22]  Santo Fortunato,et al.  Finding Statistically Significant Communities in Networks , 2010, PloS one.

[23]  Katsuhiko Murakami,et al.  H-InvDB in 2009: extended database and data mining resources for human genes and transcripts , 2009, Nucleic Acids Res..

[24]  Dmitrij Frishman,et al.  MIPS: analysis and annotation of proteins from whole genomes in 2005 , 2006, Nucleic Acids Res..

[25]  Mária Ercsey-Ravasz,et al.  Community detection by graph Voronoi diagrams , 2014 .

[26]  Kara Dolinski,et al.  Gene Ontology annotations at SGD: new data sources and annotation methods , 2007, Nucleic Acids Res..

[27]  Jure Leskovec,et al.  Overlapping community detection at scale: a nonnegative matrix factorization approach , 2013, WSDM.

[28]  Gábor Csárdi,et al.  The igraph software package for complex network research , 2006 .

[29]  M. Balasov,et al.  Functional analysis of an Orc6 mutant in Drosophila , 2009, Proceedings of the National Academy of Sciences.

[30]  Sandhya Rani,et al.  Human Protein Reference Database—2009 update , 2008, Nucleic Acids Res..

[31]  Adam J. Smith,et al.  The Database of Interacting Proteins: 2004 update , 2004, Nucleic Acids Res..

[32]  Jean-Loup Guillaume,et al.  Fast unfolding of communities in large networks , 2008, 0803.0476.

[33]  R. Milo,et al.  Network motifs in integrated cellular networks of transcription-regulation and protein-protein interaction. , 2004, Proceedings of the National Academy of Sciences of the United States of America.

[34]  Lin Gao,et al.  Anti‐triangle centrality‐based community detection in complex networks , 2014, IET systems biology.

[35]  Jure Leskovec,et al.  Defining and evaluating network communities based on ground-truth , 2012, Knowledge and Information Systems.

[36]  A. Arkin,et al.  Motifs, modules and games in bacteria. , 2003, Current opinion in microbiology.

[37]  Hans-Werner Mewes,et al.  CORUM: the comprehensive resource of mammalian protein complexes , 2007, Nucleic Acids Res..

[38]  Xiaoli Li,et al.  Computational approaches for detecting protein complexes from protein interaction networks: a survey , 2010, BMC Genomics.

[39]  Jure Leskovec,et al.  Structure and Overlaps of Ground-Truth Communities in Networks , 2014, TIST.

[40]  T. S. Evans,et al.  Clique graphs and overlapping communities , 2010, ArXiv.

[41]  Yijie Wang,et al.  Functional module identification in protein interaction networks by interaction patterns , 2014, Bioinform..

[42]  Aidong Zhang,et al.  Semantic integration to identify overlapping functional modules in protein interaction networks , 2007, BMC Bioinformatics.

[43]  F. Radicchi,et al.  Benchmark graphs for testing community detection algorithms. , 2008, Physical review. E, Statistical, nonlinear, and soft matter physics.

[44]  Lin Gao,et al.  Defining and identifying cograph communities in complex networks , 2015 .

[45]  Andrea Lancichinetti,et al.  Benchmarks for testing community detection algorithms on directed and weighted graphs with overlapping communities. , 2009, Physical review. E, Statistical, nonlinear, and soft matter physics.

[46]  Réka Albert,et al.  Conserved network motifs allow protein-protein interaction prediction , 2004, Bioinform..

[47]  Srinivasan Parthasarathy,et al.  Identifying functional modules in interaction networks through overlapping Markov clustering , 2012, Bioinform..

[48]  Martin Rosvall,et al.  Maps of random walks on complex networks reveal community structure , 2007, Proceedings of the National Academy of Sciences.

[49]  Haiyuan Yu,et al.  Detecting overlapping protein complexes in protein-protein interaction networks , 2012, Nature Methods.

[50]  A. Barabasi,et al.  Functional and topological characterization of protein interaction networks , 2004, Proteomics.

[51]  Srinivasan Parthasarathy,et al.  Efficient community detection in large networks using content and links , 2012, WWW.

[52]  Santo Fortunato,et al.  Community detection in networks: Structural communities versus ground truth , 2014, Physical review. E, Statistical, nonlinear, and soft matter physics.

[53]  Ioannis Xenarios,et al.  DIP: The Database of Interacting Proteins: 2001 update , 2001, Nucleic Acids Res..