Quest for the Golden Approach: An Experimental Evaluation of Duplicate Crowdtesting Reports Detection

Background: Given the invisibility and unpredictability of distributed crowdtesting processes, there is a large number of duplicate reports, and detecting these duplicate reports is an important task to help save testing effort. Although, many approaches have been proposed to automatically detect the duplicates, the comparison among them and the practical guidelines to adopt these approaches in crowdtesting remain vague. Aims: We aim at conducting the first experimental evaluation of the commonly-used and state-of-the-art approaches for duplicate detection in crowdtesting reports, and exploring which is the golden approach. Method: We begin with a systematic review of approaches for duplicate detection, and select ten state-of-the-art approaches for our experimental evaluation. We conduct duplicate detection with each approach on 414 crowdtesting projects with 59,289 reports collected from one of the largest crowdtesting platforms. Results: Machine learning based approach, i.e., ML-REP, and deep learning based approach, i.e., DL-BiMPM, are the best two approaches for duplicate reports detection in crowdtesting, while the later one is more sensitive to the size of training data and more time-consuming for model training and prediction. Conclusions: This paper provides new insights and guidelines to select appropriate duplicate detection techniques for duplicate crowdtesting reports detection.

[1]  Tim Menzies,et al.  iSENSE: Completion-Aware Crowdtesting Management , 2019, 2019 IEEE/ACM 41st International Conference on Software Engineering (ICSE).

[2]  Siau-Cheng Khoo,et al.  A discriminative model approach for accurate duplicate bug report retrieval , 2010, 2010 ACM/IEEE 32nd International Conference on Software Engineering.

[3]  Alessandro Moschitti,et al.  Learning to Rank Short Text Pairs with Convolutional Deep Neural Networks , 2015, SIGIR.

[4]  Felipe Meneguzzi,et al.  Measuring Semantic Similarity Between Sentences Using A Siamese Neural Network , 2018, 2018 International Joint Conference on Neural Networks (IJCNN).

[5]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[6]  David Lo,et al.  Duplicate bug report detection with a combination of information retrieval and topic modeling , 2012, 2012 Proceedings of the 27th IEEE/ACM International Conference on Automated Software Engineering.

[7]  Alessandro Moschitti,et al.  Accurate Sentence Matching with Hybrid Siamese Networks , 2017, CIKM.

[8]  Junjie Wang,et al.  Images don't lie: Duplicate crowdtesting reports detection with screenshot information , 2019, Inf. Softw. Technol..

[9]  Xuanjing Huang,et al.  Deep Fusion LSTMs for Text Semantic Matching , 2016, ACL.

[10]  Stephen E. Robertson,et al.  Simple BM25 extension to multiple weighted fields , 2004, CIKM '04.

[11]  Yang Feng,et al.  Generating descriptions for screenshots to assist crowdsourced testing , 2018, 2018 IEEE 25th International Conference on Software Analysis, Evolution and Reengineering (SANER).

[12]  Gregory N. Hullender,et al.  Learning to rank using gradient descent , 2005, ICML.

[13]  Siau-Cheng Khoo,et al.  Towards more accurate retrieval of duplicate bug reports , 2011, 2011 26th IEEE/ACM International Conference on Automated Software Engineering (ASE 2011).

[14]  Jian Zhou,et al.  Learning to rank duplicate bug reports , 2012, CIKM.

[15]  Song Wang,et al.  Context-aware In-process Crowdworker Recommendation , 2020, 2020 IEEE/ACM 42nd International Conference on Software Engineering (ICSE).

[16]  M. de Rijke,et al.  Short Text Similarity with Word Embeddings , 2015, CIKM.

[17]  Mark Harman,et al.  A survey of the use of crowdsourcing in software engineering , 2017, J. Syst. Softw..

[18]  Yang Feng,et al.  Multi-objective test report prioritization using image understanding , 2016, 2016 31st IEEE/ACM International Conference on Automated Software Engineering (ASE).

[19]  Yann LeCun,et al.  Signature Verification Using A "Siamese" Time Delay Neural Network , 1993, Int. J. Pattern Recognit. Artif. Intell..

[20]  Marco Tulio Valente,et al.  NextBug: a Bugzilla extension for recommending similar bugs , 2015, Journal of Software Engineering Research and Development.

[21]  Michael I. Jordan,et al.  Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..

[22]  Zhiguo Wang,et al.  Bilateral Multi-Perspective Matching for Natural Language Sentences , 2017, IJCAI.

[23]  Abdelwahab Hamou-Lhadj,et al.  An HMM-based approach for automatic detection and classification of duplicate bug reports , 2019, Inf. Softw. Technol..

[24]  Cor-Paul Bezemer,et al.  Revisiting the Performance Evaluation of Automated Approaches for the Retrieval of Duplicate Issue Reports , 2018, IEEE Transactions on Software Engineering.

[25]  Yang Feng,et al.  CTRAS: Crowdsourced Test Report Aggregation and Summarization , 2019, 2019 IEEE/ACM 41st International Conference on Software Engineering (ICSE).

[26]  Y. Raghu Reddy,et al.  Poster: LWE: LDA Refined Word Embeddings for Duplicate Bug Report Detection , 2018, 2018 IEEE/ACM 40th International Conference on Software Engineering: Companion (ICSE-Companion).

[27]  Jonas Mueller,et al.  Siamese Recurrent Architectures for Learning Sentence Similarity , 2016, AAAI.

[28]  Song Wang,et al.  Local-based active classification of test report to assist crowdsourced testing , 2016, 2016 31st IEEE/ACM International Conference on Automated Software Engineering (ASE).

[29]  Tao Xie,et al.  An approach to detecting duplicate bug reports using natural language and execution information , 2008, 2008 ACM/IEEE 30th International Conference on Software Engineering.

[30]  Christopher Potts,et al.  A large annotated corpus for learning natural language inference , 2015, EMNLP.

[31]  David Lo,et al.  Improved Duplicate Bug Report Identification , 2012, 2012 16th European Conference on Software Maintenance and Reengineering.

[32]  Eleni Stroulia,et al.  A contextual approach towards more accurate duplicate bug report detection and ranking , 2016, 2013 10th Working Conference on Mining Software Repositories (MSR).

[33]  Song Wang,et al.  Towards Effectively Test Report Classification to Assist Crowdsourced Testing , 2016, ESEM.

[34]  Marco Tulio Valente,et al.  An Empirical Study on Recommendations of Similar Bugs , 2016, 2016 IEEE 23rd International Conference on Software Analysis, Evolution, and Reengineering (SANER).

[35]  Xueqi Cheng,et al.  A Deep Architecture for Semantic Matching with Multiple Positional Sentence Representations , 2015, AAAI.

[36]  K. M. Annervaz,et al.  Towards Accurate Duplicate Bug Retrieval Using Deep Learning Techniques , 2017, 2017 IEEE International Conference on Software Maintenance and Evolution (ICSME).

[37]  Xin Chen,et al.  Fuzzy Clustering of Crowdsourced Test Reports for Apps , 2018, ACM Trans. Internet Techn..

[38]  Zhiguo Wang,et al.  Sentence Similarity Learning by Lexical Decomposition and Composition , 2016, COLING.

[39]  Barbara Kitchenham,et al.  Procedures for Performing Systematic Reviews , 2004 .

[40]  Zhoujun Li,et al.  Concept-based Short Text Classification and Ranking , 2014, CIKM.

[41]  Y. Raghu Reddy,et al.  Poster: DWEN: Deep Word Embedding Network for Duplicate Bug Report Detection in Software Repositories , 2018, 2018 IEEE/ACM 40th International Conference on Software Engineering: Companion (ICSE-Companion).

[42]  Shuohang Wang,et al.  A Compare-Aggregate Model for Matching Text Sequences , 2016, ICLR.

[43]  Tim Menzies,et al.  500+ Times Faster than Deep Learning: (A Case Study Exploring Faster Methods for Text Mining StackOverflow) , 2018, 2018 IEEE/ACM 15th International Conference on Mining Software Repositories (MSR).

[44]  Per Runeson,et al.  Detection of Duplicate Defect Reports Using Natural Language Processing , 2007, 29th International Conference on Software Engineering (ICSE'07).

[45]  Xinli Yang,et al.  Combining Word Embedding with Information Retrieval to Recommend Similar Bug Reports , 2016, 2016 IEEE 27th International Symposium on Software Reliability Engineering (ISSRE).

[46]  Yang Feng,et al.  Successes, challenges, and rethinking – an industrial investigation on crowdsourced mobile application testing , 2018, Empirical Software Engineering.

[47]  Song Wang,et al.  Domain Adaptation for Test Report Classification in Crowdsourced Testing , 2017, 2017 IEEE/ACM 39th International Conference on Software Engineering: Software Engineering in Practice Track (ICSE-SEIP).

[48]  Pengfei Liu,et al.  Modelling Interaction of Sentence Pair with Coupled-LSTMs , 2016, EMNLP.

[49]  Kuldip K. Paliwal,et al.  Bidirectional recurrent neural networks , 1997, IEEE Trans. Signal Process..

[50]  Baowen Xu,et al.  Test report prioritization to assist crowdsourced testing , 2015, ESEC/SIGSOFT FSE.