Improved Community Detection using Deep Embeddings from Multilayer Graphs

Community detection is a challenging, yet crucial, problem while mining large-scale graph structured data. Most existing approaches solve this problem by mapping nodes into a vector space and performing unsupervised learning with the resulting embeddings. In cases where multiple types of connectivity patterns exist for the set of nodes, commonly modeled as multilayer graphs, new strategies are required to model the inter-layer dependencies in order to perform effective inferencing. In this paper, we focus on learning embeddings for each node of a multilayer graph through neural modeling techniques, such that the complex dependencies can be concisely encoded into low-dimensional representations. Referred to as multilayer graph embeddings, these representations can be utilized for discovering community structure in a scalable fashion, even with a large number of layers. Furthermore, in order to ensure that the semantics that persist over a longer range in the network are well modeled, we propose to refine the multilayer embeddings via a proxy clustering loss and a graph modularity measure. Using real-world datasets, we demonstrate that this algorithm generates scalable and robust representations, and outperforms existing multilayer community detection approaches. Introduction Community Detection in Multilayer Graphs: Graphs are natural data structures to represent relational data, and hence modeling and inferencing with graph structured data have become central to a wide-range of applications, such as social network analysis (Eagle and Pentland 2006), recommendation systems (Rao et al. 2015), neurological modeling (Fornito, Zalesky, and Breakspear 2013) etc. Though some of these applications require supervised or semi-supervised learning formulations, mining large networks to identify cohesive clusters of densely-connected nodes is a highly prevalent idea in the graph mining literature (Blondel et al. 2008; Kim and Lee 2015). Referred to as community detection, this unsupervised learning problem is most commonly addressed by mapping nodes into a vector space and performing clustering using the resulting embeddings (Dong et al. 2012; ∗This work was performed under the auspices of the U.S. Department of Energy by Lawrence Livermore National Laboratory under Contract DE-AC52-07NA27344. Copyright c © 2019, Association for the Advancement of Artificial Intelligence (www.aaai.org). All rights reserved. Ding, Lin, and Ishwar 2016; Yang et al. 2016). These latent low-dimensional embeddings can be inferred by optimizing with a variety of measures that describe the network structure – examples include decomposition of the graph Laplacian matrix (Ng, Jordan, and Weiss 2002), stochastic factorization of the adjacency matrix (Ahmed et al. 2013; Tang et al. 2015), and decomposition of the modularity matrix (Newman 2006; Chen, Kuzmin, and Szymanski 2014; Yang et al. 2016) etc. Until recently, the majority of existing work has focused on discovering community structure from a single network. However, with the emergence of multiview network data in real-world scenarios, commonly represented as multilayer graphs, community detection has become more challenging. In general, multilayer graphs provide complementary views of connectivity patterns for the same set of nodes, thus requiring the need to model complex dependency structure across the views. The heterogeneity in the relationships, while providing richer information, makes statistical inferencing challenging. Furthermore, the varying levels of sparsity in different layers and the inherent uncertainties in neighborhoods, e.g. noisy edges or outliers, add to the complexity of this problem. Existing work on community detection from multilayer graphs can be broadly categorized into (a) methods that obtain a consensus community structure by fusing information from different layers and producing a single community label for the set of corresponding nodes (Dong et al. 2012; Dong et al. 2014; Kim, Lee, and Lim 2017; Tagarelli, Amelio, and Gullo 2017); and (b) methods that infer a separate embedding for a node in every layer, while exploiting the inter-layer dependencies, and produce multiple potential community associations for each node (Mucha et al. 2010; Bazzi et al. 2016). In this paper, we address the problem of building effective latent embeddings for nodes on every layer from multilayer graph data, and our approach falls in the latter category. Constructing Node Embeddings: At their core, node embedding approaches attempt to identify low-rank representations that can best represent the network topology. Despite their broad applicability, several of these approaches produce linear embeddings for nodes, naturally motivating the use of deep neural networks to potentially produce more expressive, non-linear embeddings. Consequently, stacked graph auto-encoder style solutions have been proposed ar X iv :1 81 1. 12 15 6v 1 [ cs .S I] 2 0 Se p 20 18 (Yang et al. 2016), that directly transform the objective measure (e.g. modularity matrix) into an undercomplete representation through a reconstruction cost. In addition to producing non-linear mappings, deep learning approaches enable the use of robust reconstruction losses in lieu of a simple `2 measure (Thiagarajan et al. 2016), and supports the inclusion of additional prior constraints on community structure (Yang et al. 2016). A known limitation of node embedding techniques has been their scalability (e.g. Eigen value decomposition) with large-scale graphs, and this issue persists even with graph autoencoders. In order to combat this limitation, recent approaches, such as DeepWalk (Perozzi, Al-Rfou, and Skiena 2014) and Node2Vec (Grover and Leskovec 2016), have resorted to a distributional hypothesis, popularly adopted in language modeling (Harris 1954), where co-occurrence of two nodes in short random walks implies a strong notion of semantic similarity. As a result, by extending highly scalable neural embedding techniques such as Word2Vec (Mikolov et al. 2013) to the construction of node embeddings, one can obtain state-of-the-art results in community detection with single-layer graphs. Proposed Work: In this paper, we develop a novel scalable technique for obtaining deep node embeddings from multilayer graphs. We show that a naı̈ve extension of DeepWalk to the multilayer case, that performs independent random walks on each of the layers, can be worse than even simple baselines, thus emphasizing the need to explicitly model dependencies across the different layers. Consequently, we propose to parameterize virtual edges to allow information flow between layers. Furthermore, the premise of using short random walks to infer the underlying semantic structure relies on the assumption that the networks are highly sparse and the node co-occurrences follow a power law. However, by allowing inter-layer edges, that assumption can be violated in cases where the semantics can persist over even longer walks. We address this challenge by including a refinement stage, where the multilayer embeddings are finetuned to produce more cohesive communities. In particular, we use entropy based proxy clustering cost and modularity based refinement. We show that the proposed approach is highly effective for as many as 37 layers and it outperforms existing approaches for multilayer community detection. Mathematical Preliminaries Definitions: A single-layer undirected, unweighted graph is represented by G = (V, E), where V denotes the set of nodes with cardinality |V| = N , and E denotes the set of edges. The goal of embedding techniques is to generate latent representations, X ∈ RN×d, where d is the desired number of latent dimensions. A multilayer graph is represented using a set of L inter-dependent graphs G = (V, E), for l = 1, . . . , L, where there exists a node mapping between every pair of layers to indicate which vertices in one graph correspond to vertices in the other. Deep Embeddings for Network Analysis: The scalability challenge of factorization techniques has motivated the use of deep learning methods to obtain node embeddings. The earliest work to report results on this direction was the DeepWalk algorithm by Perozzi et al. (Perozzi, Al-Rfou, and Skiena 2014). Interestingly, it draws analogy between node sequences generated by short random walks on graphs and sentences in a document corpus. Given this formulation, the authors utilize popular language modeling tools to obtain latent representations for the nodes (Mikolov et al. 2013). Let us consider a simple metric walkWt in step t, which is rooted at the vertex vi. The transition probability between the nodes vi and vj can be expressed as P (Wt+1 = vj |Wt = vi) = h(‖xi − xj‖2/σ), (1) where ‖xi − xj‖2 indicates the similarity metric between the two vertices in the latent space to be recovered and h is a linking function that connects the vertex similarity to the actual co-occurrence probability. With appropriate choice of the walk length, the true metric can be recovered accurately from the co-occurrence statistics constructed using random walks. Furthermore, the authors note that the frequency in which vertices appear in the short random walks follows a power-law distribution, similar to words in natural language. Given a length-S sequence of words, (w0, w1, . . . , wS−1), wherews denotes a word in the vocabulary, neural word embeddings attempt to obtain vector spaces that can recover the likelihood of observing a word given its context, i.e., P (ws|w0, w1, . . . , ws−1) over all sequences. Extending this idea to the case of graphs, a random walk on the nodes, starting from node vi, produces the sequence analogous to sentences in language data. Modularity based Community Detection: A popular measure used in community detection algorithms is the modularity function Q (Newman 2006), defined as the difference between the number of edges within cohesive communities and the expected number of

[1]  Jure Leskovec,et al.  node2vec: Scalable Feature Learning for Networks , 2016, KDD.

[2]  Pradeep Ravikumar,et al.  Collaborative Filtering with Graph Information: Consistency and Scalable Methods , 2015, NIPS.

[3]  Ali Farhadi,et al.  Unsupervised Deep Embedding for Clustering Analysis , 2015, ICML.

[4]  M. Newman,et al.  Finding community structure in networks using the eigenvectors of matrices. , 2006, Physical review. E, Statistical, nonlinear, and soft matter physics.

[6]  Tsuyoshi Murata,et al.  {m , 1934, ACML.

[7]  Jieping Ye,et al.  Discriminative K-means for Clustering , 2007, NIPS.

[8]  Boleslaw K. Szymanski,et al.  Community Detection via Maximization of Modularity and Its Variants , 2014, IEEE Transactions on Computational Social Systems.

[9]  Andrea Tagarelli,et al.  Ensemble-based community detection in multilayer networks , 2017, Data Mining and Knowledge Discovery.

[10]  Pascal Frossard,et al.  Clustering on Multi-Layer Graphs via Subspace Analysis on Grassmann Manifolds , 2013, IEEE Transactions on Signal Processing.

[11]  Mingzhe Wang,et al.  LINE: Large-scale Information Network Embedding , 2015, WWW.

[12]  Prakash Ishwar,et al.  Node Embedding via Word Embedding for Network Community Discovery , 2016, IEEE Transactions on Signal and Information Processing over Networks.

[13]  LeeJae-Gil,et al.  Community Detection in Multi-Layer Graphs , 2015 .

[14]  Alex Pentland,et al.  Reality mining: sensing complex social systems , 2006, Personal and Ubiquitous Computing.

[15]  Mason A. Porter,et al.  Community Detection in Temporal Multilayer Networks, with an Application to Correlation Networks , 2014, Multiscale Model. Simul..

[16]  Jae-Gil Lee,et al.  Community Detection in Multi-Layer Graphs: A Survey , 2015, SGMD.

[17]  Giovanni Montana,et al.  Community detection in multiplex networks using Locally Adaptive Random walks , 2015, 2015 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (ASONAM).

[18]  Michael Breakspear,et al.  Graph analysis of the human connectome: Promise, progress, and pitfalls , 2013, NeuroImage.

[19]  G. Fitzgerald,et al.  'I. , 2019, Australian journal of primary health.

[20]  Pascal Frossard,et al.  Clustering With Multi-Layer Graphs: A Spectral Perspective , 2011, IEEE Transactions on Signal Processing.

[21]  Michael I. Jordan,et al.  On Spectral Clustering: Analysis and an algorithm , 2001, NIPS.

[22]  Steven Skiena,et al.  DeepWalk: online learning of social representations , 2014, KDD.

[23]  Jukka-Pekka Onnela,et al.  Community Structure in Time-Dependent, Multiscale, and Multiplex Networks , 2009, Science.

[24]  Jeffrey Dean,et al.  Distributed Representations of Words and Phrases and their Compositionality , 2013, NIPS.

[25]  Alexander J. Smola,et al.  Distributed large-scale natural graph factorization , 2013, WWW.

[26]  P ? ? ? ? ? ? ? % ? ? ? ? , 1991 .

[27]  Xiaochun Cao,et al.  Modularity Based Community Detection with Deep Learning , 2016, IJCAI.

[28]  A. Arenas,et al.  Community detection in complex networks using extremal optimization. , 2005, Physical review. E, Statistical, nonlinear, and soft matter physics.

[29]  Mohamed Bouguessa,et al.  Mining Community Structures in Multidimensional Networks , 2017, ACM Trans. Knowl. Discov. Data.