Quality-aware collaborative question answering: methods and evaluation

Community Question Answering (QA) portals contain questions and answers contributed by hundreds of millions of users. These databases of questions and answers are of great value if they can be used directly to answer questions from any user. In this research, we address this collaborative QA task by drawing knowledge from the crowds in community QA portals such as Yahoo! Answers. Despite their popularity, it is well known that answers in community QA portals have unequal quality. We therefore propose a quality-aware framework to design methods that select answers from a community QA portal considering answer quality in addition to answer relevance. Besides using answer features for determining answer quality, we introduce several other quality-aware QA methods using answer quality derived from the expertise of answerers. Such expertise can be question independent or question dependent. We evaluate our proposed methods using a database of 95K questions and 537K answers obtained from Yahoo! Answers. Our experiments have shown that answer quality can improve QA performance significantly. Furthermore, question dependent expertise based methods are shown to outperform methods using answer features only. It is also found that there are also good answers not among the best answers identified by Yahoo! Answers users.

[1]  Gilad Mishne,et al.  Finding high-quality content in social media , 2008, WSDM '08.

[2]  Eugene Agichtein,et al.  Discovering authorities in question answer communities by using link analysis , 2007, CIKM '07.

[3]  M. de Rijke,et al.  Formal models for expert finding in enterprise corpora , 2006, SIGIR.

[4]  Qi Su,et al.  Internet-scale collection of human-reviewed data , 2007, WWW '07.

[5]  Sanda M. Harabagiu,et al.  Answering complex questions with random walk models , 2006, SIGIR '06.

[6]  Mark S. Ackerman,et al.  Expertise networks in online communities: structure and algorithms , 2007, WWW '07.

[7]  W. Bruce Croft,et al.  A framework to predict the quality of answers with non-textual features , 2006, SIGIR.

[8]  CHENGXIANG ZHAI,et al.  A study of smoothing methods for language models applied to information retrieval , 2004, TOIS.

[9]  W. Bruce Croft,et al.  Finding similar questions in large question and answer archives , 2005, CIKM '05.

[10]  Kristian J. Hammond,et al.  Question Answering from Frequently Asked Question Files: Experiences with the FAQ FINDER System , 1997, AI Mag..

[11]  Eugene Agichtein,et al.  Finding the right facts in the crowd: factoid question answering over social media , 2008, WWW.

[12]  Adam L. Berger,et al.  A Maximum Entropy Approach to Natural Language Processing , 1996, CL.

[13]  Jenq-Neng Hwang,et al.  Nonparametric multivariate density estimation: a comparative study , 1994, IEEE Trans. Signal Process..

[14]  Paul P. Maglio,et al.  Expertise identification using email communications , 2003, CIKM '03.

[15]  Chunqiang Tang,et al.  Answering relationship queries on the web , 2007, WWW '07.

[16]  W. Bruce Croft,et al.  Finding experts in community-based question-answering services , 2005, CIKM '05.

[17]  Raymond J. D'Amore Expertise community detection , 2004, SIGIR '04.

[18]  W. Bruce Croft,et al.  A language modeling approach to information retrieval , 1998, SIGIR '98.

[19]  Valentin Jijkoun,et al.  Retrieving answers from frequently asked questions pages on the web , 2005, CIKM '05.

[20]  Hoa Trang Dang,et al.  Overview of the TREC 2006 Question Answering Track 99 , 2006, TREC.

[21]  Luis Gravano,et al.  Learning to find answers to questions on the Web , 2004, TOIT.

[22]  Nick Craswell,et al.  Overview of the TREC 2006 Enterprise Track , 2006, TREC.

[23]  Georgia Koutrika,et al.  Questioning Yahoo! Answers , 2007 .

[24]  Eriks Sneiders,et al.  Automated FAQ Answering: Continued Experience with Shallow Language Understanding , 1999 .

[25]  Sanda M. Harabagiu,et al.  LCC Tools for Question Answering , 2002, TREC.

[26]  Eugene Agichtein,et al.  Hits on question answer portals: exploration of link analysis for author ranking , 2007, SIGIR.