Prediction of relatedness in stack overflow: deep learning vs. SVM: a reproducibility study

Background Xu et al. used a deep neural network (DNN) technique to classify the degree of relatedness between two knowledge units (question-answer threads) on Stack Overflow. More recently, extending Xu et al.'s work, Fu and Menzies proposed a simpler classification technique based on a fine-tuned support vector machine (SVM) that achieves similar performance but in a much shorter time. Thus, they suggested that researchers need to compare their sophisticated methods against simpler alternatives. Aim The aim of this work is to replicate the previous studies and further investigate the validity of Fu and Menzies' claim by evaluating the DNN- and SVM-based approaches on a larger dataset. We also compare the effectiveness of these two approaches against SimBow, a lightweight SVM-based method that was previously used for general community question-answering. Method We (1) collect a large dataset containing knowledge units from Stack Overflow, (2) show the value of the new dataset addressing shortcomings of the original one, (3) re-evaluate both the DNN-and SVM-based approaches on the new dataset, and (4) compare the performance of the two approaches against that of SimBow. Results We find that: (1) there are several limitations in the original dataset used in the previous studies, (2) effectiveness of both Xu et al.'s and Fu and Menzies' approaches (as measured using F1-score) drop sharply on the new dataset, (3) similar to the previous finding, performance of SVM-based approaches (Fu and Menzies' approach and SimBow) are slightly better than the DNN-based approach, (4) contrary to the previous findings, Fu and Menzies' approach runs much slower than DNN-based approach on the larger dataset - its runtime grows sharply with increase in dataset size, and (5) SimBow outperforms both Xu et al. and Fu and Menzies' approaches in terms of runtime. Conclusion We conclude that, for this task, simpler approaches based on SVM performs adequately well. We also illustrate the challenges brought by the increased size of the dataset and show the benefit of a lightweight SVM-based approach for this task.

[1]  Tao Wang,et al.  Convolutional Neural Networks over Tree Structures for Programming Language Processing , 2014, AAAI.

[2]  Chih-Jen Lin,et al.  A Practical Guide to Support Vector Classication , 2008 .

[3]  Zhenchang Xing,et al.  Domain-specific cross-language relevant question retrieval , 2016, MSR.

[4]  David Lo,et al.  Multi-Factor Duplicate Question Detection in Stack Overflow , 2015, Journal of Computer Science and Technology.

[5]  Xiaodong Gu,et al.  Deep API learning , 2016, SIGSOFT FSE.

[6]  Jeffrey Dean,et al.  Efficient Estimation of Word Representations in Vector Space , 2013, ICLR.

[7]  Guigang Zhang,et al.  Deep Learning , 2016, Int. J. Semantic Comput..

[8]  Zhenchang Xing,et al.  Domain-specific cross-language relevant question retrieval , 2016, 2016 IEEE/ACM 13th Working Conference on Mining Software Repositories (MSR).

[9]  Tim Menzies,et al.  Easy over hard: a case study on deep learning , 2017, ESEC/SIGSOFT FSE.

[10]  David Lo,et al.  An empirical study on developer interactions in StackOverflow , 2013, SAC '13.

[11]  Kurt Hornik,et al.  Multilayer feedforward networks are universal approximators , 1989, Neural Networks.

[12]  Jun Zhao,et al.  Recurrent Convolutional Neural Networks for Text Classification , 2015, AAAI.

[13]  Song Wang,et al.  Automatically Learning Semantic Features for Defect Prediction , 2016, 2016 IEEE/ACM 38th International Conference on Software Engineering (ICSE).

[14]  Martin White,et al.  Toward Deep Learning Software Repositories , 2015, 2015 IEEE/ACM 12th Working Conference on Mining Software Repositories.

[15]  Geoffrey E. Hinton,et al.  Application of Deep Belief Networks for Natural Language Understanding , 2014, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[16]  Lawrence D. Jackel,et al.  Backpropagation Applied to Handwritten Zip Code Recognition , 1989, Neural Computation.

[17]  Rainer Storn,et al.  Differential Evolution – A Simple and Efficient Heuristic for global Optimization over Continuous Spaces , 1997, J. Glob. Optim..

[18]  J. Ioannidis,et al.  Reproducibility in Science: Improving the Standard for Basic and Preclinical Research , 2015, Circulation research.

[19]  Xinli Yang,et al.  Deep Learning for Just-in-Time Defect Prediction , 2015, 2015 IEEE International Conference on Software Quality, Reliability and Security.

[20]  Jie Wang,et al.  Fixing Recurring Crash Bugs via Analyzing Q&A Sites (T) , 2015, 2015 30th IEEE/ACM International Conference on Automated Software Engineering (ASE).

[21]  Zhenchang Xing,et al.  The structure and dynamics of knowledge network in domain-specific Q&A sites: a case study of stack overflow , 2017, Empirical Software Engineering.

[22]  Martin White,et al.  Deep learning code fragments for code clone detection , 2016, 2016 31st IEEE/ACM International Conference on Automated Software Engineering (ASE).

[23]  Jeffrey Dean,et al.  Distributed Representations of Words and Phrases and their Compositionality , 2013, NIPS.

[24]  Dumitru Erhan,et al.  Deep Neural Networks for Object Detection , 2013, NIPS.

[25]  Thorsten Joachims,et al.  Text Categorization with Support Vector Machines: Learning with Many Relevant Features , 1998, ECML.

[26]  Anh Tuan Nguyen,et al.  Combining Deep Learning with Information Retrieval to Localize Buggy Files for Bug Reports (N) , 2015, 2015 30th IEEE/ACM International Conference on Automated Software Engineering (ASE).

[27]  Zhenlong Yuan,et al.  Droid-Sec: deep learning in android malware detection , 2015, SIGCOMM 2015.

[28]  Barry W. Boehm,et al.  Improving missing issue-commit link recovery using positive and unlabeled data , 2017, 2017 32nd IEEE/ACM International Conference on Automated Software Engineering (ASE).

[29]  Chanchal Kumar Roy,et al.  Mining Duplicate Questions of Stack Overflow , 2016, 2016 IEEE/ACM 13th Working Conference on Mining Software Repositories (MSR).

[30]  Zhenchang Xing,et al.  AnswerBot: Automated generation of answer summary to developers' technical questions , 2017, 2017 32nd IEEE/ACM International Conference on Automated Software Engineering (ASE).

[31]  Lena Mamykina,et al.  Design lessons from the fastest q&a site in the west , 2011, CHI.

[32]  Delphine Charlet,et al.  SimBow at SemEval-2017 Task 3: Soft-Cosine Semantic Similarity between Questions for Community Question Answering , 2017, *SEMEVAL.

[33]  Zhenchang Xing,et al.  Predicting semantically linkable knowledge in developer online forums via convolutional neural network , 2016, 2016 31st IEEE/ACM International Conference on Automated Software Engineering (ASE).

[34]  Collin McMillan,et al.  Automatically generating commit messages from diffs using neural machine translation , 2017, 2017 32nd IEEE/ACM International Conference on Automated Software Engineering (ASE).