Simplified Detection and Labeling of Overlapping Communities of Interest in Question-and-Answer Sites

In many social networks, people interact based on their interests. Community detection algorithms are then useful to reveal the sub-structures of a network and in particular interest groups. Identifying these users' communities and the interests that bind them can help us assist their life-cycle. Certain kinds of online communities such as question-and-answer (Q&A) sites or forums, have no explicit social network structure. Therefore, many traditional community detection techniques do not apply directly. In this paper, we propose TTD (Topic Trees Distributions) an efficient approach for extracting topic from Q&A sites in order to detect communities of interest. Then we compare three detection methods we applied on a dataset extracted from the popular Q&A site StackOverflow. Our method based on topic modeling and user membership assignment is shown to be much simpler and faster while preserving the quality of the detection.

[1]  Peter Mika,et al.  Ontologies are us: A unified model of social networks and semantics , 2005, J. Web Semant..

[2]  Michael I. Jordan,et al.  On Spectral Clustering: Analysis and an algorithm , 2001, NIPS.

[3]  Huiping Sun,et al.  CQArank: jointly model topics and expertise in community question answering , 2013, CIKM.

[4]  Mark Steyvers,et al.  Finding scientific topics , 2004, Proceedings of the National Academy of Sciences of the United States of America.

[5]  Aditya Pal,et al.  Routing questions for collaborative answering in Community Question Answering , 2013, 2013 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (ASONAM 2013).

[6]  Hongfei Lin,et al.  Topical community detection from mining user tagging behavior and interest , 2013, J. Assoc. Inf. Sci. Technol..

[7]  Jure Leskovec,et al.  Discovering value from community activity on focused question answering sites: a case study of stack overflow , 2012, KDD.

[8]  Bing He,et al.  Community-based topic modeling for social tagging , 2010, CIKM.

[9]  Hong Cheng,et al.  A model-based approach to attributed graph clustering , 2012, SIGMOD Conference.

[10]  John Yen,et al.  An LDA-based Community Structure Discovery Approach for Large-Scale Social Networks , 2007, 2007 IEEE Intelligence and Security Informatics.

[11]  Sune Lehmann,et al.  Link communities reveal multiscale complexity in networks , 2009, Nature.

[12]  Vahab S. Mirrokni,et al.  Large-Scale Community Detection on YouTube for Topic Discovery and Exploration , 2011, ICWSM.

[13]  Jure Leskovec,et al.  Statistical properties of community structure in large social and information networks , 2008, WWW.

[14]  Santo Fortunato,et al.  Finding Statistically Significant Communities in Networks , 2010, PloS one.

[15]  Jure Leskovec,et al.  Community Detection in Networks with Node Attributes , 2013, 2013 IEEE 13th International Conference on Data Mining.

[16]  Irwin King,et al.  Routing questions to appropriate answerers in community question answering services , 2010, CIKM.

[17]  Neil J. Hurley,et al.  Detecting Highly Overlapping Communities with Model-Based Overlapping Seed Expansion , 2010, 2010 International Conference on Advances in Social Networks Analysis and Mining.

[18]  Michael R. Lyu,et al.  A classification-based approach to question routing in community question answering , 2012, WWW.

[19]  Michael I. Jordan,et al.  Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..

[20]  Chin-Wan Chung,et al.  An efficient MapReduce algorithm for counting triangles in a very large graph , 2013, CIKM.

[21]  Boleslaw K. Szymanski,et al.  Overlapping community detection in networks: The state-of-the-art and comparative study , 2011, CSUR.

[22]  Jie Tang,et al.  ArnetMiner: extraction and mining of academic social networks , 2008, KDD.

[23]  Lada A. Adamic,et al.  Power-Law Distribution of the World Wide Web , 2000, Science.