GEMSEC: Graph Embedding with Self Clustering

Modern graph embedding procedures can efficiently process graphs with millions of nodes. In this paper, we propose GEMSEC - a graph embedding algorithm which learns a clustering of the nodes simultaneously with computing their embedding. GEMSEC is a general extension of earlier work in the domain of sequence-based graph embedding. GEMSEC places nodes in an abstract feature space where the vertex features minimize the negative log-likelihood of preserving sampled vertex neighborhoods, and it incorporates known social network properties through a machine learning regularization. We present two new social network datasets and show that by simultaneously considering the embedding and clustering problems with respect to social properties, GEMSEC extracts high-quality clusters competitive with or superior to other community detection algorithms. In experiments, the method is found to be computationally efficient and robust to the choice of hyperparameters.

[1]  Enhong Chen,et al.  Learning Deep Representations for Graph Clustering , 2014, AAAI.

[2]  Jeffrey Dean,et al.  Efficient Estimation of Word Representations in Vector Space , 2013, ICLR.

[3]  Yang Xiang,et al.  SNE: Signed Network Embedding , 2017, PAKDD.

[4]  Steven Skiena,et al.  Exact Age Prediction in Social Networks , 2015, WWW.

[5]  Christopher De Sa,et al.  Representation Tradeoffs for Hyperbolic Embeddings , 2018, ICML.

[6]  Ali Farhadi,et al.  Unsupervised Deep Embedding for Clustering Analysis , 2015, ICML.

[7]  Satish Rao,et al.  A tight bound on approximating arbitrary metrics by tree metrics , 2003, STOC '03.

[8]  Aapo Hyvärinen,et al.  Noise-contrastive estimation: A new estimation principle for unnormalized statistical models , 2010, AISTATS.

[9]  Jian Pei,et al.  Community Preserving Network Embedding , 2017, AAAI.

[10]  Jure Leskovec,et al.  Inductive Representation Learning on Large Graphs , 2017, NIPS.

[11]  Rik Sarkar,et al.  Fast Sequence-Based Embedding with Diffusion Graphs , 2018, ArXiv.

[12]  Matthieu Latapy,et al.  Computing Communities in Large Networks Using Random Walks , 2004, J. Graph Algorithms Appl..

[13]  Adriano Veloso,et al.  Unsupervised and Scalable Algorithm for Learning Node Representations , 2017, ICLR.

[14]  M. Newman,et al.  Finding community structure in very large networks. , 2004, Physical review. E, Statistical, nonlinear, and soft matter physics.

[15]  Yang Liu,et al.  subgraph2vec: Learning Distributed Representations of Rooted Sub-graphs from Large Graphs , 2016, ArXiv.

[16]  Steven Skiena,et al.  Don't Walk, Skip!: Online Learning of Multi-scale Network Embeddings , 2016, ASONAM.

[17]  Jean-Loup Guillaume,et al.  Fast unfolding of communities in large networks , 2008, 0803.0476.

[18]  Elena Marchiori,et al.  Robust Community Detection Methods with Resolution Parameter for Complex Detection in Protein Protein Interaction Networks , 2012, PRIB.

[19]  Max Welling,et al.  Semi-Supervised Classification with Graph Convolutional Networks , 2016, ICLR.

[20]  A-L Barabási,et al.  Structure and tie strengths in mobile communication networks , 2006, Proceedings of the National Academy of Sciences.

[21]  Ruslan Salakhutdinov,et al.  Revisiting Semi-Supervised Learning with Graph Embeddings , 2016, ICML.

[22]  Wenwu Zhu,et al.  Structural Deep Network Embedding , 2016, KDD.

[23]  Jeffrey Dean,et al.  Distributed Representations of Words and Phrases and their Compositionality , 2013, NIPS.

[24]  Alexander J. Smola,et al.  Distributed large-scale natural graph factorization , 2013, WWW.

[25]  Palash Goyal,et al.  Graph Embedding Techniques, Applications, and Performance: A Survey , 2017, Knowl. Based Syst..

[26]  Kevin Chen-Chuan Chang,et al.  Learning Community Embedding with Community Detection and Node Embedding on Graphs , 2017, CIKM.

[27]  Jiri Matousek,et al.  Lectures on discrete geometry , 2002, Graduate texts in mathematics.

[28]  J. van Leeuwen,et al.  Neural Networks: Tricks of the Trade , 2002, Lecture Notes in Computer Science.

[29]  Jure Leskovec,et al.  node2vec: Scalable Feature Learning for Networks , 2016, KDD.

[30]  Stephen P. Boyd,et al.  Network Lasso: Clustering and Optimization in Large Graphs , 2015, KDD.

[31]  W. Zachary,et al.  An Information Flow Model for Conflict and Fission in Small Groups , 1977, Journal of Anthropological Research.

[32]  Yiannis Kompatsiaris,et al.  Community detection in Social Media , 2012, Data Mining and Knowledge Discovery.

[33]  Steve Gregory,et al.  Finding overlapping communities in networks by label propagation , 2009, ArXiv.

[34]  Jason Weston,et al.  Deep learning via semi-supervised embedding , 2008, ICML '08.

[35]  Wei Lu,et al.  Deep Neural Networks for Learning Graph Representations , 2016, AAAI.

[36]  Jon M. Kleinberg,et al.  Group formation in large social networks: membership, growth, and evolution , 2006, KDD '06.

[37]  Fanghua Ye,et al.  Deep Autoencoder-like Nonnegative Matrix Factorization for Community Detection , 2018, CIKM.

[38]  Din J. Wasem Mining of Massive Datasets , 2014 .

[39]  Mingzhe Wang,et al.  LINE: Large-scale Information Network Embedding , 2015, WWW.

[40]  Konstantin Avrachenkov,et al.  Cooperative Game Theory Approaches for Network Partitioning , 2017, COCOON.

[41]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[42]  Santo Fortunato,et al.  Community detection in graphs , 2009, ArXiv.

[43]  Wei Zeng,et al.  Resilient Routing for Sensor Networks Using Hyperbolic Embedding of Universal Covering Space , 2010, 2010 Proceedings IEEE INFOCOM.

[44]  Rik Sarkar,et al.  Low Distortion Delaunay Embedding of Trees in Hyperbolic Plane , 2011, GD.

[45]  Anand Rajaraman,et al.  Mining of Massive Datasets , 2011 .

[46]  Jian Li,et al.  Network Embedding as Matrix Factorization: Unifying DeepWalk, LINE, PTE, and node2vec , 2017, WSDM.

[47]  Steven Skiena,et al.  DeepWalk: online learning of social representations , 2014, KDD.

[48]  Nitesh V. Chawla,et al.  Modeling a Store's Product Space as a Social Network , 2009, 2009 International Conference on Advances in Social Network Analysis and Mining.

[49]  Jie Gao,et al.  Bounded stretch geographic homotopic routing in sensor networks , 2014, IEEE INFOCOM 2014 - IEEE Conference on Computer Communications.

[50]  Daniel R. Figueiredo,et al.  struc2vec: Learning Node Representations from Structural Identity , 2017, KDD.

[51]  Wei Zeng,et al.  Spherical representation and polyhedron routing for load balancing in wireless sensor networks , 2011, 2011 Proceedings IEEE INFOCOM.