Local-based active classification of test report to assist crowdsourced testing

In crowdsourced testing, an important task is to identify the test reports that actually reveal fault - true fault, from the large number of test reports submitted by crowd workers. Most existing approaches towards this problem utilized supervised machine learning techniques, which often require users to manually label a large amount of training data. Such process is time-consuming and labor-intensive. Thus, reducing the onerous burden of manual labeling while still being able to achieve good performance is crucial. Active learning is one potential technique to address this challenge, which aims at training a good classifier with as few labeled data as possible. Nevertheless, our observation on real industrial data reveals that existing active learning approaches generate poor and unstable performances on crowdsourced testing data. We analyze the deep reason and find that the dataset has significant local biases. To address the above problems, we propose LOcal-based Active ClassiFication (LOAF) to classify true fault from crowdsourced test reports. LOAF recommends a small portion of instances which are most informative within local neighborhood, and asks user their labels, then learns classifiers based on local neighborhood. Our evaluation on 14,609 test reports of 34 commercial projects from one of the Chinese largest crowdsourced testing platforms shows that our proposed LOAF can generate promising results. In addition, its performance is even better than existing supervised learning approaches which built on large amounts of labelled historical data. Moreover, we also implement our approach and evaluate its usefulness using real-world case studies. The feedbacks from testers demonstrate its practical value.

[1]  Burr Settles,et al.  Active Learning Literature Survey , 2009 .

[2]  Ian Witten,et al.  Data Mining , 2000 .

[3]  Yu Zhou,et al.  Combining Text Mining and Data Mining for Bug Report Classification , 2014, 2014 IEEE International Conference on Software Maintenance and Evolution.

[4]  Tao Xie,et al.  An approach to detecting duplicate bug reports using natural language and execution information , 2008, 2008 ACM/IEEE 30th International Conference on Software Engineering.

[5]  Baowen Xu,et al.  Test report prioritization to assist crowdsourced testing , 2015, ESEC/SIGSOFT FSE.

[6]  Forrest Shull,et al.  Local versus Global Lessons for Defect Prediction and Effort Estimation , 2013, IEEE Transactions on Software Engineering.

[7]  David Lo,et al.  Active refinement of clone anomaly reports , 2012, 2012 34th International Conference on Software Engineering (ICSE).

[8]  Tim Menzies,et al.  Automated severity assessment of software defect reports , 2008, 2008 IEEE International Conference on Software Maintenance.

[9]  Ball State,et al.  Comparison of Distance Measures in Cluster Analysis with Dichotomous Data , 2004 .

[10]  Alex Berson,et al.  Building Data Mining Applications for CRM , 1999 .

[11]  David Lo,et al.  Active code search: incorporating user feedback to improve code search relevance , 2014, ASE.

[12]  Neil D. Lawrence,et al.  Dataset Shift in Machine Learning , 2009 .

[13]  Hisashi Kashima,et al.  Unsupervised Change Analysis Using Supervised Learning , 2008, PAKDD.

[14]  Sethuraman Panchanathan,et al.  BatchRank: A Novel Batch Mode Active Learning Framework for Hierarchical Classification , 2015, KDD.

[15]  Ivica Crnkovic,et al.  Proceedings of the the 6th joint meeting of the European software engineering conference and the ACM SIGSOFT symposium on The foundations of software engineering , 2007, FSE 2007.

[16]  Walid Maalej,et al.  Bug report, feature request, or simply praise? On automatically classifying app reviews , 2015, 2015 IEEE 23rd International Requirements Engineering Conference (RE).

[17]  Rouvoy Romain,et al.  Reproducing Context-Sensitive Crashes of Mobile Apps Using Crowdsourced Monitoring , 2016, 2016 IEEE/ACM International Conference on Mobile Software Engineering and Systems (MOBILESoft).

[18]  Harald C. Gall,et al.  Cross-project defect prediction: a large scale experiment on data vs. domain vs. process , 2009, ESEC/SIGSOFT FSE.

[19]  Tim Menzies,et al.  Local vs. global models for effort estimation and defect prediction , 2011, 2011 26th IEEE/ACM International Conference on Automated Software Engineering (ASE 2011).

[20]  Xin Yao,et al.  How to make best use of cross-company data in software effort estimation? , 2014, ICSE.

[21]  Talles M. G. de A. Barbosa,et al.  Affective Crowdsourcing Applied to Usability Testing , 2014 .

[22]  Tim Menzies,et al.  Active learning and effort estimation: Finding the essential content of software effort estimation data , 2013, IEEE Transactions on Software Engineering.

[23]  Jaechang Nam,et al.  CLAMI: Defect Prediction on Unlabeled Datasets , 2015, ASE 2015.

[24]  Christian Bird,et al.  Leveraging the Crowd: How 48,000 Users Helped Improve Lync Performance , 2013, IEEE Software.

[25]  Ning Chen,et al.  Puzzle-based automatic testing: bringing humans into the loop by solving puzzles , 2012, 2012 Proceedings of the 27th IEEE/ACM International Conference on Automated Software Engineering.

[26]  อนิรุธ สืบสิงห์,et al.  Data Mining Practical Machine Learning Tools and Techniques , 2014 .

[27]  Ayse Basar Bener,et al.  On the relative value of cross-company and within-company data for defect prediction , 2009, Empirical Software Engineering.

[28]  David Lo,et al.  DRONE: Predicting Priority of Reported Bugs by Multi-factor Analysis , 2013, ICSM.

[29]  David Lo,et al.  Active Semi-supervised Defect Categorization , 2015, 2015 IEEE 23rd International Conference on Program Comprehension.

[30]  Alessandra Gorla,et al.  Mining Apps for Abnormal Usage of Sensitive Data , 2015, 2015 IEEE/ACM 37th IEEE International Conference on Software Engineering.

[31]  Harald C. Gall,et al.  How can i improve my app? Classifying user reviews for software maintenance and evolution , 2015, 2015 IEEE International Conference on Software Maintenance and Evolution (ICSME).

[32]  Song Wang,et al.  Towards Effectively Test Report Classification to Assist Crowdsourced Testing , 2016, ESEM.

[33]  Ingo Scholtes,et al.  Categorizing bugs with social networks: A case study on four open source software communities , 2013, 2013 35th International Conference on Software Engineering (ICSE).

[34]  Barry W. Boehm,et al.  Analyzing and handling local bias for calibrating parametric cost estimation models , 2013, Inf. Softw. Technol..

[35]  Rong Jin,et al.  Active Learning by Querying Informative and Representative Examples , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[36]  James M. Rehg,et al.  Active learning for automatic classification of software behavior , 2004, ISSTA '04.

[37]  Jaechang Nam,et al.  CLAMI: Defect Prediction on Unlabeled Datasets (T) , 2015, 2015 30th IEEE/ACM International Conference on Automated Software Engineering (ASE).

[38]  Mark Harman,et al.  Developer Recommendation for Crowdsourced Software Development Tasks , 2015, 2015 IEEE Symposium on Service-Oriented System Engineering.

[39]  Song Wang,et al.  FixerCache: unsupervised caching active developers for diverse bug triage , 2014, ESEM '14.

[40]  Tim Menzies,et al.  Learning from Open-Source Projects: An Empirical Study on Defect Prediction , 2013, 2013 ACM / IEEE International Symposium on Empirical Software Engineering and Measurement.

[41]  Tao Xie,et al.  AppContext: Differentiating Malicious and Benign Mobile App Behaviors Using Context , 2015, 2015 IEEE/ACM 37th IEEE International Conference on Software Engineering.

[42]  Bernd Bruegge,et al.  Ensemble Methods for App Review Classification: An Approach for Software Evolution (N) , 2015, 2015 30th IEEE/ACM International Conference on Automated Software Engineering (ASE).

[43]  Robert H. Deng,et al.  Active Semi-supervised Approach for Checking App Behavior against Its Description , 2015, 2015 IEEE 39th Annual Computer Software and Applications Conference.

[44]  William A. Gale,et al.  A sequential algorithm for training text classifiers , 1994, SIGIR '94.

[45]  Klaas-Jan Stol,et al.  Two's company, three's a crowd: a case study of crowdsourcing software development , 2014, ICSE.

[46]  Konstantinos Tsiptsis,et al.  An Overview of Data Mining Techniques , 2010 .

[47]  Hinrich Schütze,et al.  Introduction to information retrieval , 2008 .