Defining and identifying cograph communities in complex networks

Community or module detection is a fundamental problem in complex networks. Most of the traditional algorithms available focus only on vertices in a subgraph that are densely connected among themselves while being loosely connected to the vertices outside the subgraph, ignoring the topological structure of the community. However, in most cases one needs to make further analysis on the interior topological structure of communities to obtain various meaningful subgroups. We thus propose a novel community referred to as a cograph community, which has a well-understood structure. The well-understood structure of cographs and their corresponding cotree representation allows for an immediate identification of structurally-equivalent subgroups. We develop an algorithm called the Edge P4 centrality-based divisive algorithm (EPCA) to detect these cograph communities; this algorithm is efficient, free of parameters and independent of additional measures mainly due to the novel local edge P4 centrality measure. Further, we compare the EPCA with algorithms from the existing literature on synthetic, social and biological networks to show it has superior or competitive performance in accuracy. In addition to the computational advantages over other community-detection algorithms, the EPCA provides a simple means of discovering both dense and sparse subgroups based on structural equivalence or homogeneous roles which may otherwise go undetected by other algorithms which rely on edge density measures for finding subgroups.

[1]  Andrea Lancichinetti,et al.  Detecting the overlapping and hierarchical community structure in complex networks , 2008, 0802.1218.

[2]  Srinivasan Parthasarathy,et al.  Identifying functional modules in interaction networks through overlapping Markov clustering , 2012, Bioinform..

[3]  Ulrike von Luxburg,et al.  A tutorial on spectral clustering , 2007, Stat. Comput..

[4]  Sune Lehmann,et al.  Link communities reveal multiscale complexity in networks , 2009, Nature.

[5]  H. Sebastian Seung,et al.  Learning the parts of objects by non-negative matrix factorization , 1999, Nature.

[6]  H. Sebastian Seung,et al.  Algorithms for Non-negative Matrix Factorization , 2000, NIPS.

[7]  Ioannis Xenarios,et al.  DIP: The Database of Interacting Proteins: 2001 update , 2001, Nucleic Acids Res..

[8]  Kara Dolinski,et al.  Gene Ontology annotations at SGD: new data sources and annotation methods , 2007, Nucleic Acids Res..

[9]  M. Newman,et al.  Finding community structure in networks using the eigenvectors of matrices. , 2006, Physical review. E, Statistical, nonlinear, and soft matter physics.

[10]  Steve Gregory,et al.  Finding overlapping communities in networks by label propagation , 2009, ArXiv.

[11]  Robin Palotai,et al.  Community Landscapes: An Integrative Approach to Determine Overlapping Network Module Hierarchy, Identify Key Nodes and Predict Network Dynamics , 2009, PloS one.

[12]  Cristopher Moore,et al.  Phase transition in the detection of modules in sparse networks , 2011, Physical review letters.

[13]  Filippo Radicchi,et al.  Driving Interconnected Networks to Supercriticality , 2013, 1311.7031.

[14]  Derek Greene,et al.  Normalized Mutual Information to evaluate overlapping community finding algorithms , 2011, ArXiv.

[15]  Xiaoli Li,et al.  Computational approaches for detecting protein complexes from protein interaction networks: a survey , 2010, BMC Genomics.

[16]  M. Newman,et al.  Finding community structure in very large networks. , 2004, Physical review. E, Statistical, nonlinear, and soft matter physics.

[17]  Lin Gao,et al.  Anti‐triangle centrality‐based community detection in complex networks , 2014, IET systems biology.

[18]  Michel Habib,et al.  A simple linear time algorithm for cograph recognition , 2005, Discret. Appl. Math..

[19]  Tingzhan Liu,et al.  Coarse-grained diffusion distance for community structure detection in complex networks , 2010 .

[20]  Andrea Lancichinetti,et al.  Community detection algorithms: a comparative analysis: invited presentation, extended abstract , 2009, VALUETOOLS.

[21]  Yijie Wang,et al.  Functional module identification in protein interaction networks by interaction patterns , 2014, Bioinform..

[22]  Peter Oosterveer,et al.  Conclusion and Discussion , 2017 .

[23]  Albert-László Barabási,et al.  Controllability of complex networks , 2011, Nature.

[24]  Filippo Radicchi,et al.  A paradox in community detection , 2013, ArXiv.

[25]  S. Borgatti,et al.  Analyzing Clique Overlap , 2009 .

[26]  Jean-Loup Guillaume,et al.  Fast unfolding of communities in large networks , 2008, 0803.0476.

[27]  Katsuhiko Murakami,et al.  PCDq: human protein complex database with quality index which summarizes different levels of evidences of protein complexes predicted from H-Invitational protein-protein interactions integrative dataset , 2012, BMC Systems Biology.

[28]  Limsoon Wong,et al.  Using Indirect protein-protein Interactions for protein Complex Prediction , 2008, J. Bioinform. Comput. Biol..

[29]  Andrea Lancichinetti,et al.  Erratum: Community detection algorithms: A comparative analysis [Phys. Rev. E 80, 056117 (2009)] , 2014 .

[30]  Gábor Csárdi,et al.  The igraph software package for complex network research , 2006 .

[31]  Filippo Radicchi,et al.  Detectability of communities in heterogeneous networks. , 2013, Physical review. E, Statistical, nonlinear, and soft matter physics.

[32]  Haiyuan Yu,et al.  Detecting overlapping protein complexes in protein-protein interaction networks , 2012, Nature Methods.

[33]  Alioune Ngom,et al.  The non-negative matrix factorization toolbox for biological data mining , 2013, Source Code for Biology and Medicine.

[34]  Santo Fortunato,et al.  Community detection in graphs , 2009, ArXiv.

[35]  Min Wu,et al.  A core-attachment based method to detect protein complexes in PPI networks , 2009, BMC Bioinformatics.

[36]  Yu Jun,et al.  A New Definition of Modularity for Community Detection in Complex Networks , 2012 .

[37]  Claudio Castellano,et al.  Defining and identifying communities in networks. , 2003, Proceedings of the National Academy of Sciences of the United States of America.

[38]  Hans-Werner Mewes,et al.  CORUM: the comprehensive resource of mammalian protein complexes , 2007, Nucleic Acids Res..

[39]  Yong Gao,et al.  Familial groups in social networks , 2013, Soc. Networks.

[40]  Katharina T. Huber,et al.  Orthology relations, symbolic ultrametrics, and cographs , 2013, Journal of mathematical biology.

[41]  R. Blanc Introduction to Percolation Theory , 1986 .

[42]  Martin Rosvall,et al.  Maps of random walks on complex networks reveal community structure , 2007, Proceedings of the National Academy of Sciences.

[43]  Yong Gao,et al.  Bounded Search Tree Algorithms for Parametrized Cograph Deletion: Efficient Branching Rules by Exploiting Structures of Special Graph Classes , 2012, Discret. Math. Algorithms Appl..

[44]  Yunlong Liu,et al.  Complexity and parameterized algorithms for Cograph Editing , 2012, Theor. Comput. Sci..

[45]  Jari Saramäki,et al.  Characterizing the Community Structure of Complex Networks , 2010, PloS one.

[46]  T. S. Evans,et al.  Clique graphs and overlapping communities , 2010, ArXiv.

[47]  Raj Rao Nadakuditi,et al.  Graph spectra and the detectability of community structure in networks , 2012, Physical review letters.

[48]  Anton J. Enright,et al.  An efficient algorithm for large-scale detection of protein families. , 2002, Nucleic acids research.

[49]  M E J Newman,et al.  Finding and evaluating community structure in networks. , 2003, Physical review. E, Statistical, nonlinear, and soft matter physics.

[50]  Yong Gao,et al.  The cluster deletion problem for cographs , 2013, Discret. Math..

[51]  Derek G. Corneil,et al.  Complement reducible graphs , 1981, Discret. Appl. Math..

[52]  Niels Wessel,et al.  Individual nodeʼs contribution to the mesoscale of complex networks , 2014 .

[53]  F. Radicchi,et al.  Benchmark graphs for testing community detection algorithms. , 2008, Physical review. E, Statistical, nonlinear, and soft matter physics.

[54]  Michel Habib,et al.  A Simple Linear Time LexBFS Cograph Recognition Algorithm , 2003, WG.

[55]  Santo Fortunato,et al.  Finding Statistically Significant Communities in Networks , 2010, PloS one.

[56]  Sandhya Rani,et al.  Human Protein Reference Database—2009 update , 2008, Nucleic Acids Res..

[57]  W. Zachary,et al.  An Information Flow Model for Conflict and Fission in Small Groups , 1977, Journal of Anthropological Research.

[58]  D. Lusseau,et al.  The bottlenose dolphin community of Doubtful Sound features a large proportion of long-lasting associations , 2003, Behavioral Ecology and Sociobiology.

[59]  Dmitrij Frishman,et al.  MIPS: analysis and annotation of proteins from whole genomes in 2005 , 2005, Nucleic Acids Res..

[60]  D. Bu,et al.  Topological structure analysis of the protein-protein interaction network in budding yeast. , 2003, Nucleic acids research.

[61]  M E J Newman,et al.  Community structure in social and biological networks , 2001, Proceedings of the National Academy of Sciences of the United States of America.

[62]  T. Vicsek,et al.  Uncovering the overlapping community structure of complex networks in nature and society , 2005, Nature.

[63]  Nazar Zaki,et al.  A comparative analysis of computational approaches and algorithms for protein subcomplex identification , 2014, Scientific Reports.