Category-specific models for ranking effective paraphrases in community Question Answering

Abstract Platforms for community-based Question Answering (cQA) are playing an increasing role in the synergy of information-seeking and social networks. Being able to categorize user questions is very important, since these categories are good predictors for the underlying question goal, viz. informational or subjective. Furthermore, an effective cQA platform should be capable of detecting similar past questions and relevant answers, because it is known that a high number of best answers are reusable. Therefore, question paraphrasing is not only a useful but also an essential ingredient for effective search in cQA. However, the generated paraphrases do not necessarily lead to the same answer set, and might differ in their expected quality of retrieval, for example, in their power of identifying and ranking best answers higher. We propose a novel category-specific learning to rank approach for effectively ranking paraphrases for cQA. We describe a number of different large-scale experiments using logs from Yahoo! Search and Yahoo! Answers, and demonstrate that the subjective and objective nature of cQA questions dramatically affect the recall and ranking of past answers, when fine-grained category information is put into its place. Then, category-specific models are able to adapt well to the different degree of objectivity and subjectivity of each category, and the more specific the models are, the better the results, especially when benefiting from effective semantic and syntactic features.

[1]  Idan Szpektor,et al.  Learning from the past: answering new questions with past answers , 2012, WWW.

[2]  Yong Yu,et al.  Understanding and Summarizing Answers in Community-Based Question Answering Services , 2008, COLING.

[3]  Alton Yeow-Kuan Chua,et al.  Quadripartite Graph-based Clustering of Questions , 2011, 2011 Eighth International Conference on Information Technology: New Generations.

[4]  Kenton O'Hara,et al.  Social Impact , 2019, Encyclopedia of Food and Agricultural Ethics.

[5]  Yong Yu,et al.  Recommending questions using the mdl-based tree cut model , 2008, WWW.

[6]  Ji-Rong Wen,et al.  Query clustering using user logs , 2002, TOIS.

[7]  Daniel E. Rose,et al.  Understanding user goals in web search , 2004, WWW '04.

[8]  Haifeng Wang,et al.  Paraphrasing with Search Engine Query Logs , 2010, COLING.

[9]  Sheizaf Rafaeli,et al.  Predictors of answer quality in online Q&A sites , 2008, CHI.

[10]  Mark Levene,et al.  Understanding user intent in community question answering , 2012, WWW.

[11]  Günter Neumann,et al.  The QALL-ME Framework: A specifiable-domain multilingual Question Answering architecture , 2011, J. Web Semant..

[12]  Ee-Peng Lim,et al.  Quality-aware collaborative question answering: methods and evaluation , 2009, WSDM '09.

[13]  Mohan John Blooma,et al.  Pacific Asia Conference on Information Systems ( PACIS ) 7-15-2012 Clustering Similar Questions In Social Question Answering Services , 2013 .

[14]  Thorsten Joachims,et al.  Training linear SVMs in linear time , 2006, KDD '06.

[15]  Abdul Maleq Khan Fast Distance Metric Based Data Mining Techniques Using P-trees : k-Nearest-Neighbor Classification and k-Clustering , 2001 .

[16]  Kai Wang,et al.  A syntactic tree matching approach to finding similar questions in community-based qa services , 2009, SIGIR.

[17]  Gilad Mishne,et al.  Finding high-quality content in social media , 2008, WSDM '08.

[18]  Eugene Agichtein,et al.  Exploring question subjectivity prediction in community QA , 2008, SIGIR '08.

[19]  Yong Yu,et al.  Analyzing and Predicting Not-Answered Questions in Community-based Question Answering Services , 2011, AAAI.

[20]  W. Bruce Croft,et al.  A framework to predict the quality of answers with non-textual features , 2006, SIGIR.

[21]  Yue Lu,et al.  Exploiting user profile information for answer ranking in cQA , 2012, WWW.

[22]  Chao Li,et al.  Automatically Generating Questions from Queries for Community-based Question Answering , 2011, IJCNLP.

[23]  Günter Neumann,et al.  Learning to Rank Effective Paraphrases from Query Logs for Community Question Answering , 2013, AAAI.

[24]  Qing Yang,et al.  Predicting Best Answerers for New Questions in Community Question Answering , 2010, WAIM.

[25]  Mihai Surdeanu,et al.  Learning to Rank Answers on Large Online QA Collections , 2008, ACL.

[26]  F. Maxwell Harper,et al.  Facts or friends?: distinguishing informational and conversational questions in social Q&A sites , 2009, CHI.

[27]  Eduard H. Hovy,et al.  Question Answering in Webclopedia , 2000, TREC.

[28]  Alton Yeow-Kuan Chua,et al.  What Makes a High-Quality User-Generated Answer? , 2011, IEEE Internet Computing.

[29]  Sheizaf Rafaeli,et al.  Knowledge and Social Networks in Yahoo! Answers , 2012, 2012 45th Hawaii International Conference on System Sciences.

[30]  Eugene Agichtein,et al.  Finding the right facts in the crowd: factoid question answering over social media , 2008, WWW.

[31]  John Atkinson,et al.  Evolutionary optimization for ranking how-to questions based on user-generated contents , 2013, Expert Syst. Appl..

[32]  Mihai Surdeanu,et al.  Learning to Rank Answers to Non-Factoid Questions from Web Collections , 2011, CL.

[33]  Ting Liu,et al.  Application-driven Statistical Paraphrase Generation , 2009, ACL.

[34]  W. Bruce Croft,et al.  Retrieval models for question and answer archives , 2008, SIGIR '08.

[35]  Chin-Yew Lin,et al.  Automatic Question Generation from Queries , 2008 .