CQArank: jointly model topics and expertise in community question answering

Community Question Answering (CQA) websites, where people share expertise on open platforms, have become large repositories of valuable knowledge. To bring the best value out of these knowledge repositories, it is critically important for CQA services to know how to find the right experts, retrieve archived similar questions and recommend best answers to new questions. To tackle this cluster of closely related problems in a principled approach, we proposed Topic Expertise Model (TEM), a novel probabilistic generative model with GMM hybrid, to jointly model topics and expertise by integrating textual content model and link structure analysis. Based on TEM results, we proposed CQARank to measure user interests and expertise score under different topics. Leveraging the question answering history based on long-term community reviews and voting, our method could find experts with both similar topical preference and high topical expertise. Experiments carried out on Stack Overflow data, the largest CQA focused on computer programming, show that our method achieves significant improvement over existing methods on multiple metrics.

[1]  W. Bruce Croft,et al.  A framework to predict the quality of answers with non-textual features , 2006, SIGIR.

[2]  Scott Counts,et al.  Identifying topical authorities in microblogs , 2011, WSDM '11.

[3]  Mark S. Ackerman,et al.  Expertise networks in online communities: structure and algorithms , 2007, WWW '07.

[4]  Li Cai,et al.  Learning the Latent Topics for Question Retrieval in Community QA , 2011, IJCNLP.

[5]  W. Bruce Croft,et al.  Finding experts in community-based question-answering services , 2005, CIKM '05.

[6]  Idan Szpektor,et al.  I want to answer; who has a question?: Yahoo! answers recommender system , 2011, KDD.

[7]  Rajeev Motwani,et al.  The PageRank Citation Ranking : Bringing Order to the Web , 1999, WWW 1999.

[8]  Suresh Manandhar,et al.  Improving Question Recommendation by Exploiting Information Need , 2011, ACL.

[9]  Xiang Cheng,et al.  Incremental probabilistic latent semantic analysis for automatic question recommendation , 2008, RecSys '08.

[10]  Tianyong Hao,et al.  Finding similar questions in collaborative question answering archives: toward bootstrapping-based equivalent pattern learning , 2012, Information Retrieval.

[11]  Jure Leskovec,et al.  Discovering value from community activity on focused question answering sites: a case study of stack overflow , 2012, KDD.

[12]  Chun Chen,et al.  Probabilistic question recommendation for question answering communities , 2009, WWW '09.

[13]  Joseph A. Konstan,et al.  Expert identification in community question answering: exploring question selection bias , 2010, CIKM '10.

[14]  Taher H. Haveliwala Topic-sensitive PageRank , 2002, IEEE Trans. Knowl. Data Eng..

[15]  Yue Lu,et al.  Exploiting user profile information for answer ranking in cQA , 2012, WWW.

[16]  Belle L. Tseng,et al.  User reputation in a comment rating environment , 2011, KDD.

[17]  Nizar Bouguila,et al.  Bayesian learning of finite generalized Gaussian mixture models on images , 2011, Signal Process..

[18]  Ramesh Nallapati,et al.  Labeled LDA: A supervised topic model for credit attribution in multi-labeled corpora , 2009, EMNLP.

[19]  F. Maxwell Harper,et al.  Exploring Question Selection Bias to Identify Experts and Potential Experts in Community Question Answering , 2012, TOIS.

[20]  Young-In Song,et al.  Competition-based user expertise score estimation , 2011, SIGIR.

[21]  Yong Yu,et al.  Tapping on the potential of q&a community by recommending answer providers , 2008, CIKM '08.

[22]  Michael I. Jordan,et al.  Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..

[23]  Shengrui Wang,et al.  Identifying authoritative actors in question-answering forums: the case of Yahoo! answers , 2008, KDD.

[24]  Jun Zhao,et al.  Topic-sensitive probabilistic model for expert finding in question answer communities , 2012, CIKM.

[25]  Qi He,et al.  TwitterRank: finding topic-sensitive influential twitterers , 2010, WSDM '10.

[26]  Mark Steyvers,et al.  Finding scientific topics , 2004, Proceedings of the National Academy of Sciences of the United States of America.

[27]  W. Bruce Croft,et al.  Finding similar questions in large question and answer archives , 2005, CIKM '05.

[28]  Ben He,et al.  Question-answer topic model for question retrieval in community question answering , 2012, CIKM.

[29]  Chunping Li,et al.  Topic-Level Expert Modeling in Community Question Answering , 2013, SDM.

[30]  Eugene Agichtein,et al.  Discovering authorities in question answer communities by using link analysis , 2007, CIKM '07.

[31]  Eugene Agichtein,et al.  Learning to recognize reliable users and content in social media with coupled mutual reinforcement , 2009, WWW '09.

[32]  Kai Wang,et al.  Segmentation of multi-sentence questions: towards effective question retrieval in cQA services , 2010, SIGIR '10.

[33]  Gregor Heinrich Parameter estimation for text analysis , 2009 .

[34]  Vibhu O. Mittal,et al.  Bridging the lexical chasm: statistical approaches to answer-finding , 2000, SIGIR '00.