Question Quality Analysis and Prediction in Community Question Answering Services with Coupled Mutual Reinforcement

Community question answering services (CQAS) (e.g., Yahoo! Answers) provides a platform where people post questions and answer questions posed by others. Previous works analyzed the answer quality (AQ) based on answer-related features, but neglect the question-related features on AQ. Previous work analyzed how asker- and question-related features affect the question quality (QQ) regarding the amount of attention from users, the number of answers and the question solving latency, but neglect the correlation between QQ and AQ (measured by the rating of the best answer), which is critical to quality of service (QoS). We handle this problem from two aspects. First, we additionally use QQ in measuring AQ, and analyze the correlation between a comprehensive list of features (including answer-related features) and QQ. Second, we propose the first method that estimates the probability for a given question to obtain high AQ. Our analysis on the Yahoo! Answers trace confirmed that the list of our identified features exert influence on AQ, which determines QQ. For the correlation analysis, the previous classification algorithms cannot consider the mutual interactions between multiple ($>$ 2) classes of features. We then propose a novel Coupled Semi-Supervised Mutual Reinforcement-based Label Propagation (CSMRLP) algorithm for this purpose. Our extensive experiments show that CSMRLP outperforms the Mutual Reinforcement-based Label Propagation (MRLP) and five other traditional classification algorithms in the accuracy of AQ classification, and the effectiveness of our proposed method in AQ prediction. Finally, we provide suggestions on how to create a question that will receive high AQ, which can be exploited to improve the QoS of CQAS.

[1]  Ronald Rosenfeld,et al.  Semi-supervised learning with graphs , 2005 .

[2]  Huiping Sun,et al.  CQArank: jointly model topics and expertise in community question answering , 2013, CIKM.

[3]  Haiying Shen,et al.  Knowledge Sharing in the Online Social Network of Yahoo! Answers and Its Implications , 2015, IEEE Transactions on Computers.

[4]  Sreerama K. Murthy,et al.  Automatic Construction of Decision Trees from Data: A Multi-Disciplinary Survey , 1998, Data Mining and Knowledge Discovery.

[5]  Christian S. Jensen,et al.  A generalized framework of exploring category information for question retrieval in community question answer archives , 2010, WWW '10.

[6]  J. Fleiss Measuring nominal scale agreement among many raters. , 1971 .

[7]  LASSIFICATION,et al.  ENSEMBLES OF CLASSIFIERS FOR MORPHOLOGICAL GALAXY CLASSIFICATION , 2001 .

[8]  Yoav Freund,et al.  Experiments with a New Boosting Algorithm , 1996, ICML.

[9]  Yoram Singer,et al.  Boosting and Rocchio applied to text filtering , 1998, SIGIR '98.

[10]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[11]  Jungho Im,et al.  ISPRS Journal of Photogrammetry and Remote Sensing , 2022 .

[12]  Chirag Shah,et al.  Analyzing question quality through intersubjectivity: World views and objective assessments of questions on social question-answering , 2013, ASIST.

[13]  Eugene Agichtein,et al.  Learning to recognize reliable users and content in social media with coupled mutual reinforcement , 2009, WWW '09.

[14]  Shengrui Wang,et al.  Identifying authoritative actors in question-answering forums: the case of Yahoo! answers , 2008, KDD.

[15]  Chirag Shah,et al.  Questioning the Question -- Addressing the Answerability of Questions in Community Question-Answering , 2014, 2014 47th Hawaii International Conference on System Sciences.

[16]  Lotfi A. Zadeh,et al.  FROM SEARCH ENGINES TO QUESTION-ANSWERING SYSTEMS – THE PROBLEMS OF WORLD KNOWLEDGE, RELEVANCE, DEDUCTION AND PRECISIATION , 2015 .

[17]  Gilad Mishne,et al.  Finding high-quality content in social media , 2008, WSDM '08.

[18]  Evgeniy Gabrilovich,et al.  Predicting web searcher satisfaction with existing community-based answers , 2011, SIGIR.

[19]  W. Bruce Croft,et al.  A framework to predict the quality of answers with non-textual features , 2006, SIGIR.

[20]  John C. Platt,et al.  Fast training of support vector machines using sequential minimal optimization, advances in kernel methods , 1999 .

[21]  Pat Langley,et al.  An Analysis of Bayesian Classifiers , 1992, AAAI.

[22]  Idan Szpektor,et al.  Learning from the past: answering new questions with past answers , 2012, WWW.

[23]  Vladimir N. Vapnik,et al.  The Nature of Statistical Learning Theory , 2000, Statistics for Engineering and Information Science.

[24]  Mario Fritz,et al.  Classifying materials in the real world , 2010, Image Vis. Comput..

[25]  Mark S. Ackerman,et al.  Expertise networks in online communities: structure and algorithms , 2007, WWW '07.

[26]  J. Ross Quinlan,et al.  Improved Use of Continuous Attributes in C4.5 , 1996, J. Artif. Intell. Res..

[27]  W. Bruce Croft,et al.  Finding similar questions in large question and answer archives , 2005, CIKM '05.

[28]  Eugene Agichtein,et al.  Predicting information seeker satisfaction in community question answering , 2008, SIGIR '08.

[29]  Irwin King,et al.  Routing questions to appropriate answerers in community question answering services , 2010, CIKM.

[30]  Erhard Rahm,et al.  Training selection for tuning entity matching , 2008, QDB/MUD.

[31]  Sheizaf Rafaeli,et al.  Predictors of answer quality in online Q&A sites , 2008, CHI.

[32]  Michael R. Lyu,et al.  Analyzing and predicting question quality in community question answering services , 2012, WWW.

[33]  Iryna Gurevych,et al.  A Multi-Dimensional Model for Assessing the Quality of Answers in Social Q&A Sites , 2009, ICIQ.

[34]  Gabriella Kazai,et al.  Crowdsourcing for book search evaluation: impact of hit design on comparative system ranking , 2011, SIGIR.

[35]  Robert E. Kraut,et al.  Early detection of potential experts in question answering communities , 2011, UMAP'11.

[36]  Dan Feng,et al.  Ranking community answers by modeling question-answer relationships via analogical reasoning , 2009, SIGIR.

[37]  Natasa Milic-Frayling,et al.  Socializing or knowledge sharing?: characterizing social intent in community question answering , 2009, CIKM.

[38]  Luo Si,et al.  A Probabilistic Framework for Answer Selection in Question Answering , 2007, NAACL.

[39]  Evangelos E. Milios,et al.  Finding expert users in community question answering , 2012, WWW.

[40]  Kai Wang,et al.  A syntactic tree matching approach to finding similar questions in community-based qa services , 2009, SIGIR.

[41]  Jure Leskovec,et al.  Discovering value from community activity on focused question answering sites: a case study of stack overflow , 2012, KDD.