An empirical study of topic-sensitive probabilistic model for expert finding in question answer communities

In this article, we study the problem of finding experts in community question answering (CQA). Most of the existing approaches attempt to find experts in CQA via link analysis. One primary challenge of expert finding lies in that how to improve authority score ranking based on the user information. However, these existing link analysis techniques largely fail to consider the interests, expertise, and reputation of users (question askers and answerers). To address this limitation, we propose a topic-sensitive probabilistic model, by extending the PageRank algorithm, more effectively find in the community by incorporating link and user analysis into a unified framework. We have conducted extensive experiments using a real world data set from Yahoo! Answers of English language. Results show that our method significantly outperforms the existing link analysis techniques and advances the state-of-the-art performance on expert finding in CQA.

[1]  Li Cai,et al.  Phrase-Based Translation Model for Question Retrieval in Community Question Answer Archives , 2011, ACL.

[2]  Tim Oates,et al.  Feeds That Matter: A Study of Bloglines Subscriptions , 2007, ICWSM.

[3]  Thomas L. Griffiths,et al.  Probabilistic author-topic models for information discovery , 2004, KDD.

[4]  Li Cai,et al.  Large-scale question classification in cQA by leveraging Wikipedia semantic knowledge , 2011, CIKM '11.

[5]  Shengrui Wang,et al.  Identifying authoritative actors in question-answering forums: the case of Yahoo! answers , 2008, KDD.

[6]  Gilad Mishne,et al.  Finding high-quality content in social media , 2008, WSDM '08.

[7]  Duen-Ren Liu,et al.  Integrating expert profile, reputation and link analysis for expert finding in question-answering websites , 2013, Inf. Process. Manag..

[8]  Qi He,et al.  TwitterRank: finding topic-sensitive influential twitterers , 2010, WSDM '10.

[9]  Hui Xiong,et al.  Towards expert finding by leveraging relevant categories in authority ranking , 2011, CIKM '11.

[10]  Zhiyuan Liu,et al.  Automatic Keyphrase Extraction via Topic Decomposition , 2010, EMNLP.

[11]  Jun Zhao,et al.  Joint relevance and answer quality learning for question routing in community QA , 2012, CIKM.

[12]  Young-In Song,et al.  Competition-based user expertise score estimation , 2011, SIGIR.

[13]  Yong Yu,et al.  Tapping on the potential of q&a community by recommending answer providers , 2008, CIKM '08.

[14]  Mark Steyvers,et al.  Finding scientific topics , 2004, Proceedings of the National Academy of Sciences of the United States of America.

[15]  Danyel Fisher,et al.  You Are Who You Talk To: Detecting Roles in Usenet Newsgroups , 2006, Proceedings of the 39th Annual Hawaii International Conference on System Sciences (HICSS'06).

[16]  Ayman Farahat,et al.  AuGEAS: authoritativeness grading, estimation, and sorting , 2002, CIKM '02.

[17]  Irwin King,et al.  Routing questions to appropriate answerers in community question answering services , 2010, CIKM.

[18]  Jun Zhao,et al.  Topic-sensitive probabilistic model for expert finding in question answer communities , 2012, CIKM.

[19]  M. de Rijke,et al.  Formal models for expert finding in enterprise corpora , 2006, SIGIR.

[20]  Maarten de Rijke,et al.  Language Modeling Approaches for Enterprise Tasks , 2005, TREC.

[21]  Taher H. Haveliwala Topic-sensitive PageRank , 2002, IEEE Trans. Knowl. Data Eng..

[22]  ChengXiang Zhai,et al.  Probabilistic Models for Expert Finding , 2007, ECIR.

[23]  Yang Song,et al.  Topical Keyphrase Extraction from Twitter , 2011, ACL.

[24]  Thomas L. Griffiths,et al.  The Author-Topic Model for Authors and Documents , 2004, UAI.

[25]  J. Friedman Stochastic gradient boosting , 2002 .

[26]  Florian Skopik,et al.  An Analysis of the Structure and Dynamics of Large-Scale Q/A Communities , 2011, ADBIS.

[27]  Jon Kleinberg,et al.  Authoritative sources in a hyperlinked environment , 1999, SODA '98.

[28]  Tim Oates,et al.  Modeling the Spread of Influence on the Blogosphere , 2006 .

[29]  Michael I. Jordan,et al.  Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..

[30]  ChengXiang Zhai,et al.  Probabilistic Relevance Models Based on Document and Query Generation , 2003 .

[31]  Joseph A. Konstan,et al.  Expert identification in community question answering: exploring question selection bias , 2010, CIKM '10.

[32]  Eugene Agichtein,et al.  Discovering authorities in question answer communities by using link analysis , 2007, CIKM '07.

[33]  R. A. Leibler,et al.  On Information and Sufficiency , 1951 .

[34]  Jaana Kekäläinen,et al.  Cumulated gain-based evaluation of IR techniques , 2002, TOIS.

[35]  Brian D. Davison,et al.  Topical link analysis for web search , 2006, SIGIR.

[36]  Duen-Ren Liu,et al.  Expert finding in question-answering websites: a novel hybrid approach , 2010, SAC '10.

[37]  Fang Liu,et al.  Improving Question Retrieval in Community Question Answering Using World Knowledge , 2013, IJCAI.

[38]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[39]  Rajeev Motwani,et al.  The PageRank Citation Ranking : Bringing Order to the Web , 1999, WWW 1999.

[40]  Scott Counts,et al.  Identifying topical authorities in microblogs , 2011, WSDM '11.

[41]  Mark S. Ackerman,et al.  Expertise networks in online communities: structure and algorithms , 2007, WWW '07.

[42]  Shenghua Bao,et al.  Research on Expert Search at Enterprise Track of TREC 2006 , 2005, TREC.

[43]  Eugene Agichtein,et al.  Learning to recognize reliable users and content in social media with coupled mutual reinforcement , 2009, WWW '09.