SOQDE: A Supervised Learning Based Question Difficulty Estimation Model for Stack Overflow

StackOverflow (SO), the most popular community Q&A site rewards answerers with reputation scores to encourage answers from volunteer participants. However, irrespective of the difficulty of a question, the contributor of an accepted answer is awarded with the same 'reputation' score, which may demotivate an user's additional efforts to answer a difficult question. To facilitate a question difficulty aware rewarding system, this study proposes SOQDE (Stack Overflow Question Difficulty Estimation), a supervised learning based Question difficulty estimation model for the StackOverflow. To design SOQDE, we randomly selected 936 questions from a SO datadump exported during September 2017. Two of the authors independently labeled those questions into three categories (basic, intermediate, or advanced), where conflicting labels were resolved through tie-breaking votes from a third author. We performed an empirical study to determine how the difficulty of a question impacts its outcomes, such as number of votes, resolution time, and number of votes. Our results suggest that the answers of a basic question receive more votes and therefore would generate more reputation points for an answerer. Due to less incentives relative to efforts spent by an answerer, intermediate and advanced questions encounter significantly more delays than the basic questions, which further validates the need of a model like SOQDE. To build our model, we have identified textual and contextual features of a question and divided them into two categories-pre-hoc and post-hoc features. We observed a model based on Random Forest achieving the highest mean accuracy (67.6%), using only answer-independent pre-hoc features. Accommodating answer-dependent post-hoc features, we were able to improve the mean accuracy of our model to 75.2%.

[1]  Trevor Hastie,et al.  Additive Logistic Regression : a Statistical , 1998 .

[2]  David W. Aha,et al.  Instance-Based Learning Algorithms , 1991, Machine Learning.

[3]  Christoph Treude,et al.  How do programmers ask and answer questions on the web?: NIER track , 2011, 2011 33rd International Conference on Software Engineering (ICSE).

[4]  Frank Maurer,et al.  What makes a good code example?: A study of programming Q&A in StackOverflow , 2012, 2012 28th IEEE International Conference on Software Maintenance (ICSM).

[5]  Strother H. Walker,et al.  Estimation of the probability of an event as a function of several independent variables. , 1967, Biometrika.

[6]  Christoph Treude,et al.  Augmenting API Documentation with Insights from Stack Overflow , 2016, 2016 IEEE/ACM 38th International Conference on Software Engineering (ICSE).

[7]  Nir Friedman,et al.  Bayesian Network Classifiers , 1997, Machine Learning.

[8]  Akinori Ihara,et al.  Understanding Question Quality through Affective Aspect in Q&A Site , 2016, 2016 IEEE/ACM 1st International Workshop on Emotional Awareness in Software Engineering (SEmotion).

[9]  Martin P. Robillard,et al.  Discovering essential code elements in informal documentation , 2013, 2013 35th International Conference on Software Engineering (ICSE).

[10]  Nitesh V. Chawla,et al.  SMOTE: Synthetic Minority Over-sampling Technique , 2002, J. Artif. Intell. Res..

[11]  F. Agakov,et al.  Application of high-dimensional feature selection: evaluation for genomic prediction in man , 2015, Scientific Reports.

[12]  Yoav Freund,et al.  A Short Introduction to Boosting , 1999 .

[13]  J. Ross Quinlan,et al.  Learning Efficient Classification Procedures and Their Application to Chess End Games , 1983 .

[14]  Yiming Yang,et al.  A Comparative Study on Feature Selection in Text Categorization , 1997, ICML.

[15]  J. R. Landis,et al.  The measurement of observer agreement for categorical data. , 1977, Biometrics.

[16]  Tat-Seng Chua,et al.  Discovering high quality answers in community question answering archives using a hierarchy of classifiers , 2014, Inf. Sci..

[17]  Jacob Cohen A Coefficient of Agreement for Nominal Scales , 1960 .

[18]  Thorsten Joachims,et al.  Text Categorization with Support Vector Machines: Learning with Many Relevant Features , 1998, ECML.

[19]  Steven Bird,et al.  NLTK: The Natural Language Toolkit , 2002, ACL.

[20]  Marcelo de Almeida Maia,et al.  Automated API Documentation with Tutorials Generated From Stack Overflow , 2016, SBES '16.

[21]  Juan Enrique Ramos,et al.  Using TF-IDF to Determine Word Relevance in Document Queries , 2003 .

[22]  Pat Langley,et al.  Estimating Continuous Distributions in Bayesian Classifiers , 1995, UAI.

[23]  Ee-Peng Lim,et al.  Quality-aware collaborative question answering: methods and evaluation , 2009, WSDM '09.

[24]  Lena Mamykina,et al.  Design lessons from the fastest q&a site in the west , 2011, CHI.

[25]  Michele Lanza,et al.  Harnessing Stack Overflow for the IDE , 2012, 2012 Third International Workshop on Recommendation Systems for Software Engineering (RSSE).

[26]  Zhenchang Xing,et al.  Towards Correlating Search on Google and Asking on Stack Overflow , 2016, 2016 IEEE 40th Annual Computer Software and Applications Conference (COMPSAC).

[27]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[28]  Grzegorz Chrupala,et al.  Predicting the quality of questions on Stackoverflow , 2015, RANLP.

[29]  Mihai Surdeanu,et al.  The Stanford CoreNLP Natural Language Processing Toolkit , 2014, ACL.

[30]  Jing Liu,et al.  Question Difficulty Estimation in Community Question Answering Services , 2013, EMNLP.

[32]  George Forman,et al.  An Extensive Empirical Study of Feature Selection Metrics for Text Classification , 2003, J. Mach. Learn. Res..

[33]  Martin Porter,et al.  Snowball: A language for stemming algorithms , 2001 .

[34]  Yong Yu,et al.  Searching Questions by Identifying Question Topic and Question Focus , 2008, ACL.

[35]  Tin Kam Ho,et al.  The Random Subspace Method for Constructing Decision Forests , 1998, IEEE Trans. Pattern Anal. Mach. Intell..

[36]  Gareth J. F. Jones,et al.  The good, the bad and their kins: Identifying questions with negative scores in StackOverflow , 2015, 2015 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (ASONAM).