Towards an Accurate Prediction of the Question Quality on Stack Overflow using a Deep-Learning-Based NLP Approach

Online question answering (Q&A) forums like Stack Overflow have been playing an increasingly important role in supporting the daily tasks of developers. Stack Overflow can be considered as a meeting point of experienced developers and those who are looking for a solution for a specific problem. Since anyone with any background and experience level can ask and respond to questions, the community tries to use different solutions to maintain quality, such as closing and deleting inappropriate posts. As over 8,000 posts arrive on Stack Overflow every day, the effective automatic filtering of them is essential. In this paper, we present a novel approach for classifying questions based exclusively on their linguistic and semantic features using deep learning method. Our binary classifier relying on the textual properties of posts can predict whether the question is to be closed with an accuracy of 74% similar to the results of previous metrics-based models. In accordance with our findings we conclude that by combining deep learning and natural language processing methods, the maintenance of quality at Q&A forums could be supported using only the raw text of posts.

[1]  Michele Lanza,et al.  Understanding and Classifying the Quality of Technical Forum Questions , 2014, 2014 14th International Conference on Quality Software.

[2]  Yoshua Bengio,et al.  Deep Sparse Rectifier Neural Networks , 2011, AISTATS.

[3]  Christoph Treude,et al.  How do programmers ask and answer questions on the web?: NIER track , 2011, 2011 33rd International Conference on Software Engineering (ICSE).

[4]  Nitish Srivastava,et al.  Dropout: a simple way to prevent neural networks from overfitting , 2014, J. Mach. Learn. Res..

[5]  Quoc V. Le,et al.  Distributed Representations of Sentences and Documents , 2014, ICML.

[6]  Eleni Stroulia,et al.  On the Personality Traits of StackOverflow Users , 2013, 2013 IEEE International Conference on Software Maintenance.

[7]  Massimiliano Di Penta,et al.  Automatically Classifying Posts Into Question Categories on Stack Overflow , 2018, 2018 IEEE/ACM 26th International Conference on Program Comprehension (ICPC).

[8]  Geoffrey E. Hinton,et al.  On the importance of initialization and momentum in deep learning , 2013, ICML.

[9]  Sergey Ioffe,et al.  Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift , 2015, ICML.

[10]  Sebastian Schuster,et al.  Predicting Tags for StackOverflow Questions , 2013 .

[11]  Gareth J. F. Jones,et al.  The good, the bad and their kins: Identifying questions with negative scores in StackOverflow , 2015, 2015 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (ASONAM).

[12]  Grzegorz Chrupala,et al.  Predicting the quality of questions on Stackoverflow , 2015, RANLP.

[13]  Ahmed E. Hassan,et al.  What are developers talking about? An analysis of topics and trends in Stack Overflow , 2014, Empirical Software Engineering.

[14]  Adrian Popescu,et al.  User profiling for answer quality assessment in Q&A communities , 2013, DUBMOD '13.

[15]  Ashish Sureka,et al.  Fit or unfit: analysis and prediction of 'closed questions' on stack overflow , 2013, COSN '13.

[16]  Feng Xu,et al.  Want a Good Answer? Ask a Good Question First! , 2013, ArXiv.

[17]  Hermann Ney,et al.  LSTM, GRU, Highway and a Bit of Attention: An Empirical Overview for Language Modeling in Speech Recognition , 2016, INTERSPEECH.

[18]  Ashish Sureka,et al.  Chaff from the wheat: characterization and modeling of deleted questions on stack overflow , 2014, WWW.

[19]  Antonio Gulli Deep learning with Keras : implement neural networks with Keras on Theano and TensorFlow , 2017 .

[20]  Jeffrey Dean,et al.  Distributed Representations of Words and Phrases and their Compositionality , 2013, NIPS.

[21]  Sachin Tripathi,et al.  Predicting tags for stack overflow questions using different classifiers , 2018, 2018 4th International Conference on Recent Advances in Information Technology (RAIT).