Detecting topics and overlapping communities in question and answer sites

In many social networks, people interact based on their interests. Community detection algorithms are then useful to reveal the sub-structures of a network and in particular interest groups. Identifying these users’ communities and the interests that bind them can help us assist their life-cycle. Certain kinds of online communities such as question-and-answer (Q&A) sites, have no explicit social network structure. Therefore, many traditional community detection techniques do not apply directly. In this paper, we propose an efficient approach for extracting topic from Q&A to detect communities of interest. Then we compare three detection methods we applied on a dataset extracted from the popular Q&A site StackOverflow. Our method based on topic modeling and user membership assignment is shown to be much simpler and faster while preserving the quality of the detection.

[1]  Yanchi Liu,et al.  Community detection in graphs through correlation , 2014, KDD.

[2]  Huiping Sun,et al.  CQArank: jointly model topics and expertise in community question answering , 2013, CIKM.

[3]  Vahab S. Mirrokni,et al.  Large-Scale Community Detection on YouTube for Topic Discovery and Exploration , 2011, ICWSM.

[4]  Jure Leskovec,et al.  Statistical properties of community structure in large social and information networks , 2008, WWW.

[5]  Jure Leskovec,et al.  Community Detection in Networks with Node Attributes , 2013, 2013 IEEE 13th International Conference on Data Mining.

[6]  Santo Fortunato,et al.  Finding Statistically Significant Communities in Networks , 2010, PloS one.

[7]  Hongfei Lin,et al.  Topical community detection from mining user tagging behavior and interest , 2013, J. Assoc. Inf. Sci. Technol..

[8]  Steve Gregory,et al.  Fuzzy overlapping communities in networks , 2010, ArXiv.

[9]  Sune Lehmann,et al.  Link communities reveal multiscale complexity in networks , 2009, Nature.

[10]  Peter Mika,et al.  Ontologies are us: A unified model of social networks and semantics , 2005, J. Web Semant..

[11]  Neil J. Hurley,et al.  Detecting Highly Overlapping Communities with Model-Based Overlapping Seed Expansion , 2010, 2010 International Conference on Advances in Social Networks Analysis and Mining.

[12]  Aditya Pal,et al.  Routing questions for collaborative answering in Community Question Answering , 2013, 2013 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (ASONAM 2013).

[13]  Michael I. Jordan,et al.  Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..

[14]  Bing He,et al.  Community-based topic modeling for social tagging , 2010, CIKM.

[15]  Jure Leskovec,et al.  Discovering value from community activity on focused question answering sites: a case study of stack overflow , 2012, KDD.

[16]  Mark Steyvers,et al.  Finding scientific topics , 2004, Proceedings of the National Academy of Sciences of the United States of America.

[17]  Santo Fortunato,et al.  Community detection in graphs , 2009, ArXiv.

[18]  Irwin King,et al.  Routing questions to appropriate answerers in community question answering services , 2010, CIKM.

[19]  Steve Gregory,et al.  Finding overlapping communities in networks by label propagation , 2009, ArXiv.

[20]  Sergei Vassilvitskii,et al.  Counting triangles and the curse of the last reducer , 2011, WWW.

[21]  Michael I. Jordan,et al.  On Spectral Clustering: Analysis and an algorithm , 2001, NIPS.

[22]  Boleslaw K. Szymanski,et al.  Overlapping community detection in networks: The state-of-the-art and comparative study , 2011, CSUR.

[23]  David M Blei,et al.  Efficient discovery of overlapping communities in massive networks , 2013, Proceedings of the National Academy of Sciences.

[24]  Hong Cheng,et al.  A model-based approach to attributed graph clustering , 2012, SIGMOD Conference.

[25]  John Yen,et al.  An LDA-based Community Structure Discovery Approach for Large-Scale Social Networks , 2007, 2007 IEEE Intelligence and Security Informatics.

[26]  Jie Tang,et al.  ArnetMiner: extraction and mining of academic social networks , 2008, KDD.

[27]  Lada A. Adamic,et al.  Power-Law Distribution of the World Wide Web , 2000, Science.

[28]  Michael R. Lyu,et al.  A classification-based approach to question routing in community question answering , 2012, WWW.