A Unified Framework for Community Detection and Network Representation Learning

Network representation learning (NRL) aims to learn low-dimensional vectors for vertices in a network. Most existing NRL methods focus on learning representations from local context of vertices (such as their neighbors). Nevertheless, vertices in many complex networks also exhibit significant global patterns widely known as communities. It's intuitive that vertices in the same community tend to connect densely and share common attributes. These patterns are expected to improve NRL and benefit relevant evaluation tasks, such as link prediction and vertex classification. Inspired by the analogy between network representation learning and text modeling, we propose a unified NRL framework by introducing community information of vertices, named as Community-enhanced Network Representation Learning (CNRL). CNRL simultaneously detects community distribution of each vertex and learns embeddings of both vertices and communities. Moreover, the proposed community enhancement mechanism can be applied to various existing NRL models. In experiments, we evaluate our model on vertex classification, link prediction, and community detection using several real-world datasets. The results demonstrate that CNRL significantly and consistently outperforms other state-of-the-art methods while verifying our assumptions on the correlations between vertices and communities.

[1]  Jure Leskovec,et al.  Community Detection in Networks with Node Attributes , 2013, 2013 IEEE 13th International Conference on Data Mining.

[2]  John E. Hopcroft,et al.  Using community information to improve the precision of link prediction methods , 2012, WWW.

[3]  Sune Lehmann,et al.  Link communities reveal multiscale complexity in networks , 2009, Nature.

[4]  Krishna P. Gummadi,et al.  You are who you know: inferring user profiles in online social networks , 2010, WSDM '10.

[5]  Kevin Chen-Chuan Chang,et al.  User profiling in an ego network: co-profiling attributes and relationships , 2014, WWW.

[6]  Linyuan Lu,et al.  Link Prediction in Complex Networks: A Survey , 2010, ArXiv.

[7]  Ling Huang,et al.  Joint Link Prediction and Attribute Inference Using a Social-Attribute Network , 2014, TIST.

[8]  Chih-Jen Lin,et al.  LIBLINEAR: A Library for Large Linear Classification , 2008, J. Mach. Learn. Res..

[9]  Max Welling,et al.  Semi-Supervised Classification with Graph Convolutional Networks , 2016, ICLR.

[10]  Yoshihiro Yamanishi,et al.  propagation: A fast semisupervised learning algorithm for link prediction , 2009 .

[11]  Tamara G. Kolda,et al.  Temporal Link Prediction Using Matrix and Tensor Factorizations , 2010, TKDD.

[12]  Richi Nayak,et al.  Analyzing the Effectiveness of Graph Metrics for Anomaly Detection in Online Social Networks , 2012, WISE.

[13]  Andrew McCallum,et al.  Automating the Construction of Internet Portals with Machine Learning , 2000, Information Retrieval.

[14]  Jeffrey Dean,et al.  Efficient Estimation of Word Representations in Vector Space , 2013, ICLR.

[15]  Jian Li,et al.  Network Embedding as Matrix Factorization: Unifying DeepWalk, LINE, PTE, and node2vec , 2017, WSDM.

[16]  Charles Elkan,et al.  Link Prediction via Matrix Factorization , 2011, ECML/PKDD.

[17]  Tiago P. Peixoto Model selection and hypothesis testing for large-scale network models with overlapping groups , 2014, ArXiv.

[18]  Zhiyuan Liu,et al.  PLDA+: Parallel latent dirichlet allocation with data placement and pipeline processing , 2011, TIST.

[19]  Jure Leskovec,et al.  Community-Affiliation Graph Model for Overlapping Network Community Detection , 2012, 2012 IEEE 12th International Conference on Data Mining.

[20]  Zhiyuan Liu,et al.  PRISM , 2017, ACM Trans. Intell. Syst. Technol..

[21]  Xuanjing Huang,et al.  Incorporate Group Information to Enhance Network Embedding , 2016, CIKM.

[22]  M E J Newman,et al.  Modularity and community structure in networks. , 2006, Proceedings of the National Academy of Sciences of the United States of America.

[23]  Deli Zhao,et al.  Network Representation Learning with Rich Text Information , 2015, IJCAI.

[24]  Wei Lu,et al.  Deep Neural Networks for Learning Graph Representations , 2016, AAAI.

[25]  John D. Burger,et al.  Discriminating Gender on Twitter , 2011, EMNLP.

[26]  Qiongkai Xu,et al.  GraRep: Learning Graph Representations with Global Structural Information , 2015, CIKM.

[27]  Hang Li,et al.  Variation Autoencoder Based Network Representation Learning for Classification , 2017, ACL.

[28]  Edward Y. Chang,et al.  PLDA: Parallel Latent Dirichlet Allocation for Large-Scale Applications , 2009, AAIM.

[29]  Wenwu Zhu,et al.  Structural Deep Network Embedding , 2016, KDD.

[30]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[31]  Steven Skiena,et al.  DeepWalk: online learning of social representations , 2014, KDD.

[32]  Bamshad Mobasher,et al.  Personalized recommendation in social tagging systems using hierarchical clustering , 2008, RecSys '08.

[33]  Michael R. Lyu,et al.  Incorporating Implicit Link Preference Into Overlapping Community Detection , 2015, AAAI.

[34]  Nagiza F. Samatova,et al.  Community-based anomaly detection in evolutionary networks , 2012, Journal of Intelligent Information Systems.

[35]  Huan Liu,et al.  Relational learning via latent social dimensions , 2009, KDD.

[36]  Jure Leskovec,et al.  node2vec: Scalable Feature Learning for Networks , 2016, KDD.

[37]  M E J Newman,et al.  Finding and evaluating community structure in networks. , 2003, Physical review. E, Statistical, nonlinear, and soft matter physics.

[38]  Michael McGill,et al.  Introduction to Modern Information Retrieval , 1983 .

[39]  Jeffrey Dean,et al.  Distributed Representations of Words and Phrases and their Compositionality , 2013, NIPS.

[40]  D. Hand,et al.  Bayesian anomaly detection methods for social networks , 2010, 1011.1788.

[41]  Alex Pothen,et al.  PARTITIONING SPARSE MATRICES WITH EIGENVECTORS OF GRAPHS* , 1990 .

[42]  W. Zachary,et al.  An Information Flow Model for Conflict and Fission in Small Groups , 1977, Journal of Anthropological Research.

[43]  Jure Leskovec,et al.  Overlapping community detection at scale: a nonnegative matrix factorization approach , 2013, WSDM.

[44]  Boleslaw K. Szymanski,et al.  Overlapping community detection in networks: The state-of-the-art and comparative study , 2011, CSUR.

[45]  Fei Wang,et al.  Community discovery using nonnegative matrix factorization , 2011, Data Mining and Knowledge Discovery.

[46]  Michael I. Jordan,et al.  Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..

[47]  Bo Zhang,et al.  Discriminative Deep Random Walk for Network Classification , 2016, ACL.

[48]  Mingzhe Wang,et al.  LINE: Large-scale Information Network Embedding , 2015, WWW.

[49]  Dayou Liu,et al.  Discovering Communities from Social Networks: Methodologies and Applications , 2010, Handbook of Social Network Technologies.

[50]  Linyuan Lü,et al.  Predicting missing links via local information , 2009, 0901.0553.

[51]  Jure Leskovec,et al.  Inductive Representation Learning on Large Graphs , 2017, NIPS.

[52]  Jian Pei,et al.  Community Preserving Network Embedding , 2017, AAAI.

[53]  Y. Elovici,et al.  Strangers Intrusion Detection - Detecting Spammers and Fake Proles in Social Networks Based on Topology Anomalies , 2012 .

[54]  Mark Steyvers,et al.  Finding scientific topics , 2004, Proceedings of the National Academy of Sciences of the United States of America.

[55]  M. McPherson,et al.  Birds of a Feather: Homophily in Social Networks , 2001 .

[56]  J. Kumpula,et al.  Sequential algorithm for fast clique percolation. , 2008, Physical review. E, Statistical, nonlinear, and soft matter physics.

[57]  Zhiyuan Liu,et al.  TransNet: Translation-Based Network Representation Learning for Social Relation Extraction , 2017, IJCAI.

[58]  Sara Rosenthal,et al.  Age Prediction in Blogs: A Study of Style, Content, and Online Behavior in Pre- and Post-Social Media Generations , 2011, ACL.

[59]  Jean-Loup Guillaume,et al.  Fast unfolding of communities in large networks , 2008, 0803.0476.

[60]  T. Snijders,et al.  Estimation and Prediction for Stochastic Blockstructures , 2001 .

[61]  Santo Fortunato,et al.  Community detection in graphs , 2009, ArXiv.

[62]  Margaret L. Kern,et al.  Personality, Gender, and Age in the Language of Social Media: The Open-Vocabulary Approach , 2013, PloS one.

[63]  David Liben-Nowell,et al.  The link-prediction problem for social networks , 2007 .

[64]  J. Hanley,et al.  The meaning and use of the area under a receiver operating characteristic (ROC) curve. , 1982, Radiology.

[65]  Lise Getoor,et al.  Collective Classification in Network Data , 2008, AI Mag..

[66]  Ruslan Salakhutdinov,et al.  Revisiting Semi-Supervised Learning with Graph Embeddings , 2016, ICML.

[67]  Zhiyuan Liu,et al.  Max-Margin DeepWalk: Discriminative Learning of Network Representation , 2016, IJCAI.

[68]  S.,et al.  An Efficient Heuristic Procedure for Partitioning Graphs , 2022 .

[69]  Kevin Chen-Chuan Chang,et al.  Learning Community Embedding with Community Detection and Node Embedding on Graphs , 2017, CIKM.

[70]  Daniel R. Figueiredo,et al.  struc2vec: Learning Node Representations from Structural Identity , 2017, KDD.

[71]  T. Vicsek,et al.  Uncovering the overlapping community structure of complex networks in nature and society , 2005, Nature.

[72]  C. Faloutsos,et al.  Anomaly Detection in Large Graphs , 2020 .

[73]  M. Newman Clustering and preferential attachment in growing networks. , 2001, Physical review. E, Statistical, nonlinear, and soft matter physics.

[74]  Zhiyuan Liu,et al.  CANE: Context-Aware Network Embedding for Relation Modeling , 2017, ACL.

[75]  Charu C. Aggarwal,et al.  Heterogeneous Network Embedding via Deep Architectures , 2015, KDD.

[76]  Hisashi Kashima,et al.  Fast and Scalable Algorithms for Semi-supervised Link Prediction on Static and Dynamic Graphs , 2010, ECML/PKDD.