Overlapping community detection and temporal analysis on Q&A sites

In many social networks, people interact based on their relationship network. Community detection algorithms are then useful to reveal the sub-structures of a network. Identifying these users' communities can help us assist their life-cycle. However, in certain kinds of online communities such as question-and-answer (Q&A) sites or forums, people interact based on common topics of interest, rather than an explicit relationship network. Therefore, many traditional community detection techniques do not apply directly. Discovering those topics of interest is critical to identify users' communities. Besides, users' activities on certain topics of interest are evolving with time and it is therefore very important to extract their temporal dynamics. In this paper, we first propose Topic Trees Distributions (TTD), an efficient approach for extracting topics from Q&A sites in order to detect overlapping communities. We then extend TTD to propose Temporal Topic Expertise Activity (TTEA), a graphical probabilistic model to extract both topics-based expertise and temporal information. We evaluated and compared our models with state-of-the-art approaches on a dataset extracted from the popular Q&A site StackOverflow.

[1]  Bin Wang,et al.  Learning to rank for question routing in community question answering , 2013, CIKM.

[2]  Robert E. Kraut,et al.  Early detection of potential experts in question answering communities , 2011, UMAP'11.

[3]  W. Bruce Croft,et al.  LDA-based document models for ad-hoc retrieval , 2006, SIGIR.

[4]  Peter Mika,et al.  Ontologies are us: A unified model of social networks and semantics , 2005, J. Web Semant..

[5]  Jure Leskovec,et al.  Community Detection in Networks with Node Attributes , 2013, 2013 IEEE 13th International Conference on Data Mining.

[6]  Michael R. Lyu,et al.  A classification-based approach to question routing in community question answering , 2012, WWW.

[7]  Irwin King,et al.  Routing questions to appropriate answerers in community question answering services , 2010, CIKM.

[8]  Shengrui Wang,et al.  Identifying authoritative actors in question-answering forums: the case of Yahoo! answers , 2008, KDD.

[9]  Chin-Wan Chung,et al.  An efficient MapReduce algorithm for counting triangles in a very large graph , 2013, CIKM.

[10]  Boleslaw K. Szymanski,et al.  Overlapping community detection in networks: The state-of-the-art and comparative study , 2011, CSUR.

[11]  Jie Tang,et al.  ArnetMiner: extraction and mining of academic social networks , 2008, KDD.

[12]  Lada A. Adamic,et al.  Power-Law Distribution of the World Wide Web , 2000, Science.

[13]  Michael I. Jordan,et al.  Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..

[14]  Hong Cheng,et al.  A model-based approach to attributed graph clustering , 2012, SIGMOD Conference.

[15]  Mark S. Ackerman,et al.  Expertise networks in online communities: structure and algorithms , 2007, WWW '07.

[16]  Huiping Sun,et al.  CQArank: jointly model topics and expertise in community question answering , 2013, CIKM.

[17]  Yong Yu,et al.  Tapping on the potential of q&a community by recommending answer providers , 2008, CIKM '08.

[18]  Mark Steyvers,et al.  Finding scientific topics , 2004, Proceedings of the National Academy of Sciences of the United States of America.

[19]  Hongfei Lin,et al.  Topical community detection from mining user tagging behavior and interest , 2013, J. Assoc. Inf. Sci. Technol..

[20]  Andrew McCallum,et al.  Topics over time: a non-Markov continuous-time model of topical trends , 2006, KDD '06.

[21]  John Yen,et al.  An LDA-based Community Structure Discovery Approach for Large-Scale Social Networks , 2007, 2007 IEEE Intelligence and Security Informatics.

[22]  Eugene Agichtein,et al.  Discovering authorities in question answer communities by using link analysis , 2007, CIKM '07.

[23]  Vahab S. Mirrokni,et al.  Large-Scale Community Detection on YouTube for Topic Discovery and Exploration , 2011, ICWSM.

[24]  Jure Leskovec,et al.  Statistical properties of community structure in large social and information networks , 2008, WWW.

[25]  Michael I. Jordan,et al.  On Spectral Clustering: Analysis and an algorithm , 2001, NIPS.

[26]  Bing He,et al.  Community-based topic modeling for social tagging , 2010, CIKM.

[27]  Santo Fortunato,et al.  Finding Statistically Significant Communities in Networks , 2010, PloS one.

[28]  Thomas Hofmann,et al.  Probabilistic Latent Semantic Analysis , 1999, UAI.

[29]  Neil J. Hurley,et al.  Detecting Highly Overlapping Communities with Model-Based Overlapping Seed Expansion , 2010, 2010 International Conference on Advances in Social Networks Analysis and Mining.

[30]  Sune Lehmann,et al.  Link communities reveal multiscale complexity in networks , 2009, Nature.

[31]  Hua Lu,et al.  A unified model for stable and temporal topic detection from social media data , 2013, 2013 IEEE 29th International Conference on Data Engineering (ICDE).

[32]  Aditya Pal,et al.  Routing questions for collaborative answering in Community Question Answering , 2013, 2013 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (ASONAM 2013).

[33]  Jure Leskovec,et al.  Discovering value from community activity on focused question answering sites: a case study of stack overflow , 2012, KDD.