User community detection via embedding of social network structure and temporal content

Abstract Identifying and extracting user communities is an important step towards understanding social network dynamics from a macro perspective. For this reason, the work in this paper explores various aspects related to the identification of user communities. To date, user community detection methods employ either explicit links between users (link analysis), or users’ topics of interest in posted content (content analysis), or in tandem. Little work has considered temporal evolution when identifying user communities in a way to group together those users who share not only similar topical interests but also similar temporal behavior towards their topics of interest. In this paper, we identify user communities through multimodal feature learning (embeddings). Our core contributions can be enumerated as (a) we propose a new method for learning neural embeddings for users based on their temporal content similarity; (b) we learn user embeddings based on their social network connections (links) through neural graph embeddings; (c) we systematically interpolate temporal content-based embeddings and social link-based embeddings to capture both social network connections and temporal content evolution for representing users, and (d) we systematically evaluate the quality of each embedding type in isolation and also when interpolated together and demonstrate their performance on a Twitter dataset under two different application scenarios, namely news recommendation and user prediction. We find that (1) content-based methods produce higher quality communities compared to link-based methods; (2) methods that consider temporal evolution of content, our proposed method in particular, show better performance compared to their non-temporal counter-parts; (3) communities that are produced when time is explicitly incorporated in user vector representations have higher quality than the ones produced when time is incorporated into a generative process, and finally (4) while link-based methods are weaker than content-based methods, their interpolation with content-based methods leads to improved quality of the identified communities.

[1]  Vikram Pudi,et al.  Author2Vec: Learning Author Representations by Combining Content and Link Information , 2016, WWW.

[2]  Alessandro Lomi,et al.  Beyond Homophily: Incorporating Actor Variables in Actor-oriented Network Models , 2018, 1803.07172.

[3]  Max Welling,et al.  Semi-Supervised Classification with Graph Convolutional Networks , 2016, ICLR.

[4]  Yuan Zuo,et al.  Word network topic model: a simple but general solution for short and imbalanced texts , 2014, Knowledge and Information Systems.

[5]  P. Radha Krishna,et al.  A community driven social recommendation system , 2015, 2015 IEEE International Conference on Big Data (Big Data).

[6]  Lada A. Adamic,et al.  The political blogosphere and the 2004 U.S. election: divided they blog , 2005, LinkKDD '05.

[7]  Jeffrey Dean,et al.  Distributed Representations of Words and Phrases and their Compositionality , 2013, NIPS.

[8]  Tanmoy Chakraborty,et al.  Metadata vs. Ground-truth: A Myth behind the Evolution of Community Detection Methods , 2018, WWW.

[9]  Xie Lei,et al.  DICH: A framework for discovering implicit communities hidden in tweets , 2015, World Wide Web.

[10]  Yihong Gong,et al.  Combining content and link for classification using matrix factorization , 2007, SIGIR.

[11]  Andrea Lancichinetti,et al.  Community detection algorithms: a comparative analysis: invited presentation, extended abstract , 2009, VALUETOOLS.

[12]  Roberto Grossi,et al.  D2K: Scalable Community Detection in Massive Networks via Small-Diameter k-Plexes , 2018, KDD.

[13]  Jure Leskovec,et al.  Empirical comparison of algorithms for network community detection , 2010, WWW '10.

[14]  Li Chen,et al.  Learning User Embedding Representation for Gender Prediction , 2016, 2016 IEEE 28th International Conference on Tools with Artificial Intelligence (ICTAI).

[15]  Byron C. Wallace,et al.  Modelling Context with User Embeddings for Sarcasm Detection in Social Media , 2016, CoNLL.

[16]  Xindong Wu,et al.  Topic Modeling over Short Texts by Incorporating Word Embeddings , 2016, PAKDD.

[17]  Mohsen Kahani,et al.  Predicting Users' Future Interests on Twitter , 2017, ECIR.

[18]  Jure Leskovec,et al.  The bursty dynamics of the Twitter information network , 2014, WWW.

[19]  Srinivasan Parthasarathy,et al.  Efficient community detection in large networks using content and links , 2012, WWW.

[20]  Ognjen Arandjelovic,et al.  Weighted Linear Fusion of Multimodal Data: A Reasonable Baseline? , 2016, ACM Multimedia.

[21]  Jean-Loup Guillaume,et al.  Fast unfolding of communities in large networks , 2008, 0803.0476.

[22]  Yiming Yang,et al.  Graph Convolutional Matrix Completion for Bipartite Edge Prediction. , 2018 .

[23]  L. Venkata Subramaniam,et al.  Using content and interactions for discovering communities in social networks , 2012, WWW.

[24]  M. McPherson,et al.  Birds of a Feather: Homophily in Social Networks , 2001 .

[25]  Mohsen Kahani,et al.  Inferring Implicit Topical Interests on Twitter , 2016, ECIR.

[26]  Yu Wu,et al.  Community Detection Based on Topic Distance in Social Tagging Networks , 2014 .

[27]  Frank Thomson Leighton,et al.  Multicommodity max-flow min-cut theorems and their use in designing approximation algorithms , 1999, JACM.

[28]  Chang-Tsun Li,et al.  People Identification and Tracking Through Fusion of Facial and Gait Features , 2014, BIOMET.

[29]  Nicola Barbieri,et al.  Efficient Methods for Influence-Based Network-Oblivious Community Detection , 2016, ACM Trans. Intell. Syst. Technol..

[30]  Wenwu Zhu,et al.  Structural Deep Network Embedding , 2016, KDD.

[31]  Lise Getoor,et al.  Relationship Identification for Social Network Discovery , 2007, AAAI.

[32]  Fan Chung Graham,et al.  Local Graph Partitioning using PageRank Vectors , 2006, 2006 47th Annual IEEE Symposium on Foundations of Computer Science (FOCS'06).

[33]  Roberto Cipolla,et al.  A new look at filtering techniques for illumination invariance in automatic face recognition , 2006, 7th International Conference on Automatic Face and Gesture Recognition (FGR06).

[34]  Xin Zhao,et al.  Finding Diachronic Like‐Minded Users , 2018, Comput. Intell..

[35]  Max Welling,et al.  Variational Graph Auto-Encoders , 2016, ArXiv.

[36]  Michael I. Jordan,et al.  Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..

[37]  Steffen Bickel,et al.  Unsupervised prediction of citation influences , 2007, ICML '07.

[38]  Yossi Richter,et al.  Predicting Customer Churn in Mobile Networks through Analysis of Social Groups , 2010, SDM.

[39]  Jiawei Han,et al.  Latent Community Topic Analysis: Integration of Community Discovery with Topic Modeling , 2012, TIST.

[40]  Nicolas Dugué,et al.  Identifying the community roles of social capitalists in the Twitter network , 2014, 2014 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (ASONAM 2014).

[41]  Minqiang Li,et al.  Personalized recommendations based on time-weighted overlapping community detection , 2015, Inf. Manag..

[42]  Yoshua Bengio,et al.  Hierarchical Probabilistic Neural Network Language Model , 2005, AISTATS.

[43]  Wei Lu,et al.  Deep Neural Networks for Learning Graph Representations , 2016, AAAI.

[44]  Ebrahim Bagheri,et al.  Temporally Like-minded User Community Identification through Neural Embeddings , 2017, CIKM.

[45]  Santo Fortunato,et al.  Community detection in graphs , 2009, ArXiv.

[46]  Mohammed J. Zaki,et al.  TRICLUSTER: an effective algorithm for mining coherent clusters in 3D microarray data , 2005, SIGMOD '05.

[47]  Xiaohui Yan,et al.  A biterm topic model for short texts , 2013, WWW.

[48]  Jure Leskovec,et al.  Inductive Representation Learning on Large Graphs , 2017, NIPS.

[49]  Junjie Yao,et al.  Community Level Diffusion Extraction , 2015, SIGMOD Conference.

[50]  Inderjit S. Dhillon,et al.  Weighted Graph Cuts without Eigenvectors A Multilevel Approach , 2007, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[51]  Jure Leskovec,et al.  node2vec: Scalable Feature Learning for Networks , 2016, KDD.

[52]  Paolo Ferragina,et al.  Fast and Accurate Annotation of Short Texts with Wikipedia Pages , 2010, IEEE Software.

[53]  Ramesh Nallapati,et al.  Joint latent topic models for text and citations , 2008, KDD.

[54]  Hakan Ferhatosmanoglu,et al.  Short text classification in twitter to improve information filtering , 2010, SIGIR.

[55]  Qi Gao,et al.  Analyzing user modeling on twitter for personalized news recommendations , 2011, UMAP'11.

[56]  Michal Rosen-Zvi,et al.  Latent Topic Models for Hypertext , 2008, UAI.

[57]  M E J Newman,et al.  Community structure in social and biological networks , 2001, Proceedings of the National Academy of Sciences of the United States of America.

[58]  Jeffrey Dean,et al.  Efficient Estimation of Word Representations in Vector Space , 2013, ICLR.

[59]  Meng Wang,et al.  Community Detection in Social Networks: An In-depth Benchmarking Study with a Procedure-Oriented Framework , 2015, Proc. VLDB Endow..

[60]  Yannis Manolopoulos,et al.  Preference dynamics with multimodal user-item interactions in social media recommendation , 2017, Expert Syst. Appl..

[61]  Xiaojun Wan,et al.  User Embedding for Scholarly Microblog Recommendation , 2016, ACL.

[62]  Tomas Olovsson,et al.  An Evaluation of Community Detection Algorithms on Large-Scale Email Traffic , 2012, SEA.

[63]  Hongyuan Zha,et al.  Probabilistic models for discovering e-communities , 2006, WWW '06.

[64]  Yelena Yesha,et al.  A Scalable System for Community Discovery in Twitter During Hurricane Sandy , 2014, 2014 14th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing.

[65]  Xiaodi Zhu,et al.  An Empirical Study of the Financial Community Network on Twitter , 2013 .

[66]  Steven Skiena,et al.  DeepWalk: online learning of social representations , 2014, KDD.

[67]  Yun Chi,et al.  Combining link and content for community detection: a discriminative approach , 2009, KDD.

[68]  Junjie Yao,et al.  User Group Oriented Temporal Dynamics Exploration , 2014, AAAI.

[69]  Mark Dredze,et al.  Learning Multiview Embeddings of Twitter Users , 2016, ACL.

[70]  Yiannis Kompatsiaris,et al.  A soft frequent pattern mining approach for textual topic detection , 2014, WIMS '14.

[71]  Aalaa Mojahed,et al.  Applying Clustering Analysis to Heterogeneous Data Using Similarity Matrix Fusion (SMF) , 2015, MLDM.