QDEE: Question Difficulty and Expertise Estimation in Community Question Answering Sites

In this paper, we present a framework for Question Difficulty and Expertise Estimation (QDEE) in Community Question Answering sites (CQAs) such as Yahoo! Answers and Stack Overflow, which tackles a fundamental challenge in crowdsourcing: how to appropriately route and assign questions to users with the suitable expertise. This problem domain has been the subject of much research and includes both language-agnostic as well as language conscious solutions. We bring to bear a key language-agnostic insight: that users gain expertise and therefore tend to ask as well as answer more difficult questions over time. We use this insight within the popular competition (directed) graph model to estimate question difficulty and user expertise by identifying key hierarchical structure within said model. An important and novel contribution here is the application of "social agony" to this problem domain. Difficulty levels of newly posted questions (the cold-start problem) are estimated by using our QDEE framework and additional textual features. We also propose a model to route newly posted questions to appropriate users based on the difficulty level of the question and the expertise of the user. Extensive experiments on real world CQAs such as Yahoo! Answers and Stack Overflow data demonstrate the improved efficacy of our approach over contemporary state-of-the-art models. The QDEE framework also allows us to characterize user expertise in novel ways by identifying interesting patterns and roles played by different users in such CQAs.

[1]  Srinivasan Parthasarathy,et al.  Symmetrizations for clustering directed graphs , 2011, EDBT/ICDT '11.

[2]  Srinivasan Parthasarathy,et al.  Component Detection in Directed Networks , 2014, CIKM.

[3]  Yueting Zhuang,et al.  Community-Based Question Answering via Heterogeneous Social Network Learning , 2016, AAAI.

[4]  Hui Xiong,et al.  Learning to Recommend Accurate and Diverse Items , 2017, WWW.

[5]  Byron J. Gao,et al.  Learning to rank for hybrid recommendation , 2012, CIKM.

[6]  Zhiwei Sun,et al.  Question/Answer Matching for CQA System via Combining Lexical and Sequential Information , 2015, AAAI.

[7]  Thomas Hofmann,et al.  TrueSkill™: A Bayesian Skill Rating System , 2007 .

[8]  Benjamin V. Hanrahan,et al.  Modeling problem difficulty and expertise in stackoverflow , 2012, CSCW.

[9]  Mária Bieliková,et al.  Exploiting Content Quality and Question Difficulty in CQA Reputation Systems , 2016, NetSci-X.

[10]  Alessandro Bozzon,et al.  Sparrows and Owls: Characterisation of Expert Behaviour in StackOverflow , 2014, UMAP.

[11]  Byron J. Gao,et al.  Adapting vector space model to ranking-based collaborative filtering , 2012, CIKM.

[12]  Young-In Song,et al.  Competition-based user expertise score estimation , 2011, SIGIR.

[13]  Srinivasan Parthasarathy,et al.  Breaking Cycles In Noisy Hierarchies , 2017, WebSci.

[14]  Byron J. Gao,et al.  VSRank: A Novel Framework for Ranking-Based Collaborative Filtering , 2014, TIST.

[15]  Liviu Iftode,et al.  Finding hierarchy in directed online social networks , 2011, WWW.

[16]  Soo Young Rieh,et al.  Beyond Questioning and Answering: Teens' Learning Experiences and Benefits of Social Q&A Services , 2017, CSCW Companion.

[17]  Kristina Lerman,et al.  Dynamics of Content Quality in Collaborative Knowledge Production , 2017, ICWSM.

[18]  Tom Minka,et al.  TrueSkillTM: A Bayesian Skill Rating System , 2006, NIPS.