Characterization and Prediction of Questions without Accepted Answers on Stack Overflow

A fast and effective approach to obtain information regarding software development problems is to search them to find similar solved problems or post questions on community question answering (CQA) websites. Solving coding problems in a short time is important, so these CQAs have a considerable impact on the software development process. However, if developers do not get their expected answers, the websites will not be useful, and software development time will increase. Stack Overflow is the most popular CQA concerning programming problems. According to its rules, the only sign that shows a question poser has achieved the desired answer is the user’s acceptance. In this paper, we investigate unresolved questions, without accepted answers, on Stack Overflow. The number of unresolved questions is increasing. As of August 2019, 47% of Stack Overflow questions were unresolved. In this study, we analyze the effectiveness of various features, including some novel features, to resolve a question. We do not use the features that contain information not present at the time of asking a question, such as answers. To evaluate our features, we deploy several predictive models trained on the features of 18 million questions to predict whether a question will get an accepted answer or not. The results of this study show a significant relationship between our proposed features and getting accepted answers. Finally, we introduce an online tool that predicts whether a question will get an accepted answer or not. Currently, Stack Overflow’s users do not receive any feedback on their questions before asking them, so they could carelessly ask unclear, unreadable, or inappropriately tagged questions. By using this tool, they can modify their questions and tags to check the different results of the tool and deliberately improve their questions to get accepted answers.

[1]  Alton Yeow-Kuan Chua,et al.  Answers or no answers: Studying question answerability in Stack Overflow , 2015, J. Inf. Sci..

[2]  Yong Yu,et al.  Analyzing and Predicting Not-Answered Questions in Community-based Question Answering Services , 2011, AAAI.

[3]  Alexander Serebrenik,et al.  StackOverflow and GitHub: Associations between Software Development and Crowdsourced Knowledge , 2013, 2013 International Conference on Social Computing.

[4]  Ahmed E. Hassan,et al.  Understanding the factors for fast answers in technical Q&A websites , 2017, Empirical Software Engineering.

[5]  Tianqi Chen,et al.  XGBoost: A Scalable Tree Boosting System , 2016, KDD.

[6]  Eric R. Ziegel,et al.  The Elements of Statistical Learning , 2003, Technometrics.

[7]  Daniele Romano,et al.  Using source code metrics to predict change-prone Java interfaces , 2011, 2011 27th IEEE International Conference on Software Maintenance (ICSM).

[8]  Gail C. Murphy,et al.  Locating Latent Design Information in Developer Discussions: A Study on Pull Requests , 2019, IEEE Transactions on Software Engineering.

[9]  Feng Xu,et al.  Detecting high-quality posts in community question answering sites , 2015, Inf. Sci..

[10]  Chirag Shah,et al.  Analyzing question quality through intersubjectivity: World views and objective assessments of questions on social question-answering , 2013, ASIST.

[11]  Leman Akoglu,et al.  Min(e)d your tags: Analysis of Question response time in StackOverflow , 2014, 2014 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (ASONAM 2014).

[12]  Zhenchang Xing,et al.  Towards Correlating Search on Google and Asking on Stack Overflow , 2016, 2016 IEEE 40th Annual Computer Software and Applications Conference (COMPSAC).

[13]  Grzegorz Chrupala,et al.  Predicting the quality of questions on Stackoverflow , 2015, RANLP.

[14]  Bart Baesens,et al.  Benchmarking Classification Models for Software Defect Prediction: A Proposed Framework and Novel Findings , 2008, IEEE Transactions on Software Engineering.

[15]  Nicole Novielli,et al.  Towards discovering the role of emotions in stack overflow , 2014, SSE@SIGSOFT FSE.

[16]  Zhenchang Xing,et al.  What do developers search for on the web? , 2017, Empirical Software Engineering.

[17]  Christoph Treude,et al.  How do programmers ask and answer questions on the web?: NIER track , 2011, 2011 33rd International Conference on Software Engineering (ICSE).

[18]  David W. Hosmer,et al.  Applied Logistic Regression , 1991 .

[19]  Chanchal Kumar Roy,et al.  An Insight into the Unresolved Questions at Stack Overflow , 2015, 2015 IEEE/ACM 12th Working Conference on Mining Software Repositories.

[20]  Michael R. Lyu,et al.  Analyzing and predicting question quality in community question answering services , 2012, WWW.

[21]  Andrea Mocci,et al.  How to gamify software engineering , 2017, 2017 IEEE 24th International Conference on Software Analysis, Evolution and Reengineering (SANER).

[22]  Alberto Bacchelli,et al.  ETA: Estimated Time of Answer Predicting Response Time in Stack Overflow , 2015, 2015 IEEE/ACM 12th Working Conference on Mining Software Repositories.

[23]  Ahmed E. Hassan,et al.  An Experience Report on Defect Modelling in Practice: Pitfalls and Challenges , 2017, 2018 IEEE/ACM 40th International Conference on Software Engineering: Software Engineering in Practice Track (ICSE-SEIP).

[24]  Chanchal Kumar Roy,et al.  Answering questions about unanswered questions of Stack Overflow , 2013, 2013 10th Working Conference on Mining Software Repositories (MSR).

[25]  Nicole Novielli,et al.  How to ask for technical help? Evidence-based guidelines for writing questions on Stack Overflow , 2017, Inf. Softw. Technol..

[26]  Jure Leskovec,et al.  Discovering value from community activity on focused question answering sites: a case study of stack overflow , 2012, KDD.

[27]  Andrew P. Bradley,et al.  The use of the area under the ROC curve in the evaluation of machine learning algorithms , 1997, Pattern Recognit..

[28]  Michele Lanza,et al.  StORMeD: Stack Overflow Ready Made Data , 2015, 2015 IEEE/ACM 12th Working Conference on Mining Software Repositories.

[29]  Ron Kohavi,et al.  The Case against Accuracy Estimation for Comparing Induction Algorithms , 1998, ICML.

[30]  Dewayne E. Perry,et al.  Toward understanding the causes of unanswered questions in software information sites: a case study of stack overflow , 2013, ESEC/FSE 2013.