On early detection of high voted Q&A on Stack Overflow

Early detection of high quality content on community question answering platforms is an important emerging problem in which the main goal is the detection of high quality questions and answers in a short time right after their submission. Improving the process of question routing, reducing the number of questions with no answers, improving the user experience and also promoting the content quality of a CQA by rejecting low quality contents are all benefits of solving the early detection of high quality content problem in CQA. The main challenge of solving this problem is that the value of a few features is available in a short time after submission of a content in CQA. In other words, unlike previous related research, it is not possible to utilize comprehensive set of features to detect high quality content. In this paper, we view the content quality from the perspective of the voting outcome. Specifically, we consider those Q&A which will get more votes than a certain threshold as high quality posts. Analyzing large amount of data in a CQA, we observed two important patterns which help us with early detection of high quality content. We named the first pattern as accepted answer effect and the second pattern as answer competition effect. According to the first pattern, the chance of a high quality question to get an accepted answer is higher than the chance of other questions and vice versa. According to the second pattern, only few number of answers of a specific question will be high quality answers. We show that these patterns are valid in a short time after the submission of content on CQA. Utilizing these patterns, we propose a unified relational classification framework to solve the problem. In our proposed framework, the quality of a given question and its associated answers can be predicted simultaneously soon after their submission. We conduct several experiments on six data collections gathered from Stack Overflow in order to show the efficiency of the proposed models. Our experiments indicate that the performance of high quality content detection can improve up to 10.7% and 35.3% in comparison with a state-of-the-art independent classifier for questions and answers, respectively. Moreover, we found 1.2% and 11.8% F-measure gain in average versus a recent strong baseline by Yao et al. (2015) for questions and answers, respectively.

[1]  Chanchal Kumar Roy,et al.  Mining Duplicate Questions of Stack Overflow , 2016, 2016 IEEE/ACM 13th Working Conference on Mining Software Repositories (MSR).

[2]  Oliver Ferschke,et al.  What makes a good biography?: multidimensional quality analysis based on wikipedia article feedback data , 2014, WWW.

[3]  Vladimir Zadorozhny,et al.  Automatic evaluation of information provider reliability and expertise , 2013, World Wide Web.

[4]  Gareth J. F. Jones,et al.  The good, the bad and their kins: Identifying questions with negative scores in StackOverflow , 2015, 2015 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (ASONAM).

[5]  Alejandro Figueroa,et al.  Search clicks analysis for discovering temporally anchored questions in community Question Answering , 2016, Expert Syst. Appl..

[6]  Pasquale Lops,et al.  Social Question Answering , 2016, ACM Trans. Inf. Syst..

[7]  Michael I. Jordan,et al.  Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..

[8]  Mária Bieliková,et al.  A Comprehensive Survey and Classification of Approaches for Community Question Answering , 2016, ACM Trans. Web.

[9]  Djoerd Hiemstra,et al.  Integration of scientific and social networks , 2013, World Wide Web.

[10]  Benno Stein,et al.  Predicting quality flaws in user-generated content: the case of wikipedia , 2012, SIGIR '12.

[11]  Michele Lanza,et al.  Understanding and Classifying the Quality of Technical Forum Questions , 2014, 2014 14th International Conference on Quality Software.

[12]  Jie Zhou,et al.  Optimal answerer ranking for new questions in community question answering , 2015, Inf. Process. Manag..

[13]  Ravi Kumar,et al.  Great Question! Question Quality in Community Q&A , 2014, ICWSM.

[14]  Eugene Agichtein,et al.  Predicting information seeker satisfaction in community question answering , 2008, SIGIR '08.

[15]  Ee-Peng Lim,et al.  Quality-aware collaborative question answering: methods and evaluation , 2009, WSDM '09.

[16]  Kyumin Lee,et al.  Detecting experts on Quora: by their activity, quality of answers, linguistic characteristics and temporal behaviors , 2016, Social Network Analysis and Mining.

[17]  Dietrich Klakow,et al.  Bridging the vocabulary gap between questions and answer sentences , 2015, Inf. Process. Manag..

[18]  R. Hinde,et al.  Governing the Commons: The Evolution of Institutions for Governing the Commons: The Evolution of Institutions for Collective Action Collective Action , 2010 .

[19]  David van Dijk,et al.  Early Detection of Topical Expertise in Community Question Answering , 2015, SIGIR.

[20]  Yue Lu,et al.  Exploiting social context for review quality prediction , 2010, WWW '10.

[21]  Alton Yeow-Kuan Chua,et al.  Answers or no answers: Studying question answerability in Stack Overflow , 2015, J. Inf. Sci..

[22]  Gilad Mishne,et al.  Finding high-quality content in social media , 2008, WSDM '08.

[23]  Alton Yeow-Kuan Chua,et al.  Predictors of High-Quality Answers , 2012, Online Inf. Rev..

[24]  James Caverlee,et al.  Ranking Comments on the Social Web , 2009, 2009 International Conference on Computational Science and Engineering.

[25]  Yiqiang Chen,et al.  ASELM: Adaptive semi-supervised ELM with application in question subjectivity identification , 2016, Neurocomputing.

[26]  Michele Lanza,et al.  Improving Low Quality Stack Overflow Post Detection , 2014, 2014 IEEE International Conference on Software Maintenance and Evolution.

[27]  Gareth J. F. Jones,et al.  Nearest Neighbour based Transformation Functions for Text Classification: A Case Study with StackOverflow , 2016, ICTIR.

[28]  Jian Feng,et al.  Predicting the quality of user-generated answers using co-training in community-based question answering portals , 2015, Pattern Recognit. Lett..

[29]  Robert E. Kraut,et al.  Early detection of potential experts in question answering communities , 2011, UMAP'11.

[30]  Daniele Quercia,et al.  The Social World of Content Abusers in Community Question Answering , 2015, WWW.

[31]  Nicole Novielli,et al.  Mining Successful Answers in Stack Overflow , 2015, 2015 IEEE/ACM 12th Working Conference on Mining Software Repositories.

[32]  Leman Akoglu,et al.  Min(e)d your tags: Analysis of Question response time in StackOverflow , 2014, 2014 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (ASONAM 2014).

[33]  F. Maxwell Harper,et al.  Exploring Question Selection Bias to Identify Experts and Potential Experts in Community Question Answering , 2012, TOIS.

[34]  Pável Calado,et al.  A general multiview framework for assessing the quality of collaboratively created content on web 2.0 , 2017, J. Assoc. Inf. Sci. Technol..

[35]  Geert-Jan Houben,et al.  Identification of useful user comments in social media: a case study on flickr commons , 2013, JCDL '13.

[36]  Mohamed Bouguessa,et al.  Identifying Authorities in Online Communities , 2015, ACM Trans. Intell. Syst. Technol..

[37]  Hamid Beigy,et al.  Expertise Finding in Bibliographic Network: Topic Dominance Learning Approach , 2014, IEEE Transactions on Cybernetics.

[38]  Flavio Figueiredo,et al.  Assessing the quality of textual features in social media , 2013, Inf. Process. Manag..

[39]  Chanchal Kumar Roy,et al.  Answering questions about unanswered questions of Stack Overflow , 2013, 2013 10th Working Conference on Mining Software Repositories (MSR).

[40]  Djoerd Hiemstra,et al.  Expert group formation using facility location analysis , 2014, Inf. Process. Manag..

[41]  Fernando Batista,et al.  Using geolocated tweets for characterization of Twitter in Portugal and the Portuguese administrative regions , 2016, Social Network Analysis and Mining.

[42]  Günter Neumann,et al.  Context-aware semantic classification of search queries for browsing community question-answering archives , 2016, Knowl. Based Syst..

[43]  Feng Xu,et al.  Detecting high-quality posts in community question answering sites , 2015, Inf. Sci..

[44]  Michael R. Lyu,et al.  Analyzing and predicting question quality in community question answering services , 2012, WWW.

[45]  Eugene Agichtein,et al.  Learning to recognize reliable users and content in social media with coupled mutual reinforcement , 2009, WWW '09.

[46]  Tat-Seng Chua,et al.  Discovering high quality answers in community question answering archives using a hierarchy of classifiers , 2014, Inf. Sci..

[47]  Liqiang Nie,et al.  Exploring heterogeneous features for query-focused summarization of categorized community answers , 2016, Inf. Sci..