Predicting the quality of questions on Stackoverflow

Community Question Answering websites (CQA) have a growing popularity as a way of providing and searching of information. CQA attract users as they provide a direct and rapid way to find the desired information. As recognizing good questions can improve the CQA services and the user’s experience, the current study focuses on question quality instead. Specifically, we predict question quality and investigate the features which influence it. The influence of the question tags, length of the question title and body, presence of a code snippet, the user reputation and terms used to formulate the question are tested. For each set of dependent variables, Ridge regression models are estimated. The results indicate that the inclusion of terms in the models improves their predictive power. Additionally, we investigate which lexical terms determine high and low quality questions. The terms with the highest and lowest coefficients are semantically analyzed. The analysis shows that terms predicting high quality are terms expressing, among others, excitement, negative experience or terms regarding exceptions. Terms predicting low quality questions are terms containing spelling errors or indicating off-topic questions and interjections.

[1]  Michael R. Lyu,et al.  Analyzing and predicting question quality in community question answering services , 2012, WWW.

[2]  Frank Maurer,et al.  What makes a good code example?: A study of programming Q&A in StackOverflow , 2012, 2012 28th IEEE International Conference on Software Maintenance (ICSM).

[3]  A. Hartmann,et al.  Exploring nonlinear relations: Models of clinical decision making by regression with optimal scaling , 2009, Psychotherapy research : journal of the Society for Psychotherapy Research.

[4]  Christopher D. Manning,et al.  Introduction to Information Retrieval , 2010, J. Assoc. Inf. Sci. Technol..

[5]  Priscilla S. Markwood,et al.  The Long Tail: Why the Future of Business is Selling Less of More , 2006 .

[6]  Jure Leskovec,et al.  Discovering value from community activity on focused question answering sites: a case study of stack overflow , 2012, KDD.

[7]  F. Maxwell Harper,et al.  Exploring Question Selection Bias to Identify Experts and Potential Experts in Community Question Answering , 2012, TOIS.

[8]  Dewayne E. Perry,et al.  Toward understanding the causes of unanswered questions in software information sites: a case study of stack overflow , 2013, ESEC/FSE 2013.

[9]  Chris Anderson,et al.  The Long Tail: Why the Future of Business is Selling Less of More , 2006 .

[10]  Arthur E. Hoerl,et al.  Ridge Regression: Biased Estimation for Nonorthogonal Problems , 2000, Technometrics.

[11]  Danyel Fisher,et al.  Visualizing the Signatures of Social Roles in Online Discussion Groups , 2007, J. Soc. Struct..

[12]  A. E. Hoerl,et al.  Ridge Regression: Applications to Nonorthogonal Problems , 1970 .

[13]  Chanchal Kumar Roy,et al.  Answering questions about unanswered questions of Stack Overflow , 2013, 2013 10th Working Conference on Mining Software Repositories (MSR).

[14]  Fred P. Brooks,et al.  The Mythical Man-Month , 1975, Reliable Software.

[15]  Yong Yu,et al.  Analyzing and Predicting Not-Answered Questions in Community-based Question Answering Services , 2011, AAAI.

[16]  Wibke Weber Text Visualization - What Colors Tell About a Text , 2007, 2007 11th International Conference Information Visualization (IV '07).

[17]  Gilad Mishne,et al.  Finding high-quality content in social media , 2008, WSDM '08.

[18]  Andy P. Field,et al.  Discovering Statistics Using SPSS , 2000 .

[19]  A. E. Hoerl,et al.  Ridge regression: biased estimation for nonorthogonal problems , 2000 .

[20]  T. W. Korner,et al.  What makes a good code , 1988 .

[21]  Christoph Treude,et al.  How do programmers ask and answer questions on the web?: NIER track , 2011, 2011 33rd International Conference on Software Engineering (ICSE).

[22]  Ashish Sureka,et al.  Chaff from the wheat: characterization and modeling of deleted questions on stack overflow , 2014, WWW.