Effective Automated Decision Support for Managing Crowdtesting

Crowdtesting has grown to be an effective alter-native to traditional testing, especially in mobile apps. However,crowdtesting is hard to manage in nature. Given the complexity of mobile applications and unpredictability of distributed, parallel crowdtesting process, it is difficult to estimate (a) the remaining number of bugs as yet undetected or (b) the required cost to find those bugs. Experience-based decisions may result in ineffective crowdtesting process. This paper aims at exploring automated decision support to effectively manage crowdtesting process. The proposed ISENSE applies incremental sampling technique to process crowdtesting reports arriving in chronological order, organizes them into fixed-size groups as dynamic inputs, and predicts two test completion indicators in an incrementally manner. The two indicators are: 1)total number of bugs predicted with Capture-ReCapture (CRC)model, and 2) required test cost for achieving certain test objectives predicted with AutoRegressive Integrated Moving Average(ARIMA) model. We assess ISENSE using 46,434 reports of 218 crowdtesting tasks from one of the largest crowdtesting platforms in China. Its effectiveness is demonstrated through two applications for automating crowdtesting management, i.e. automation oftask closing decision, and semi-automation of task closing trade-off analysis. The results show that decision automation using ISENSE will provide managers with greater opportunities to achieve cost-effectiveness gains of crowdtesting. Specifically, a median of 100% bugs can be detected with 30% saved cost basedon the automated close prediction

[1]  Thomas Georgian,et al.  Incremental Sampling Methodology (ISM) for Metallic Residues , 2013 .

[2]  Taghi M. Khoshgoftaar,et al.  THE USE OF UNDER- AND OVERSAMPLING WITHIN ENSEMBLE FEATURE SELECTION AND CLASSIFICATION FOR SOFTWARE QUALITY PREDICTION , 2014 .

[3]  William Lewis,et al.  Software Testing and Continuous Quality Improvement , 2000 .

[4]  Richard Lai,et al.  When to stop testing: A study from the perspective of software reliability models , 2011, IET Softw..

[5]  Rouvoy Romain,et al.  Reproducing Context-Sensitive Crashes of Mobile Apps Using Crowdsourced Monitoring , 2016, 2016 IEEE/ACM International Conference on Mobile Software Engineering and Systems (MOBILESoft).

[6]  Tim Menzies,et al.  Heterogeneous Defect Prediction , 2018, IEEE Trans. Software Eng..

[7]  Xin Chen,et al.  Fuzzy Clustering of Crowdsourced Test Reports for Apps , 2018, ACM Trans. Internet Techn..

[8]  Christian Bird,et al.  Leveraging the Crowd: How 48,000 Users Helped Improve Lync Performance , 2013, IEEE Software.

[9]  Guowei Yang,et al.  COCOON: Crowdsourced Testing Quality Maximization Under Context Coverage Constraint , 2017, 2017 IEEE 28th International Symposium on Software Reliability Engineering (ISSRE).

[10]  Xiang Li,et al.  A generic data-driven software reliability model with model mining technique , 2010, Reliab. Eng. Syst. Saf..

[11]  Martin Neil,et al.  Using Bayesian networks to predict software defects and reliability , 2008 .

[12]  Enrico Zio,et al.  Failure and reliability prediction by support vector machines regression of time series data , 2011, Reliab. Eng. Syst. Saf..

[13]  Shen-Ming Lee,et al.  Estimating population size for capture-recapture data when capture probabilities vary by time, behavior and individual animal , 1996 .

[14]  K. Burnham,et al.  Estimation of the size of a closed population when capture probabilities vary among animals , 1978 .

[15]  Mark Harman,et al.  Empirical evaluation of pareto efficient multi-objective regression test case prioritisation , 2015, ISSTA.

[16]  Andrea De Lucia,et al.  Improving Multi-Objective Test Case Selection by Injecting Diversity in Genetic Algorithms , 2015, IEEE Transactions on Software Engineering.

[17]  Song Wang,et al.  Local-based active classification of test report to assist crowdsourced testing , 2016, 2016 31st IEEE/ACM International Conference on Automated Software Engineering (ASE).

[18]  Padmal Vitharana,et al.  Defect propagation at the project-level: results and a post-hoc analysis on inspection efficiency , 2017, Empirical Software Engineering.

[19]  Song Wang,et al.  Domain Adaptation for Test Report Classification in Crowdsourced Testing , 2017, 2017 IEEE/ACM 39th International Conference on Software Engineering: Software Engineering in Practice Track (ICSE-SEIP).

[20]  Tim Menzies,et al.  Data Mining Static Code Attributes to Learn Defect Predictors , 2007, IEEE Transactions on Software Engineering.

[21]  He Zhang,et al.  The adoption of capture-recapture in software engineering: a systematic literature review , 2015, EASE.

[22]  Chenggang Bai,et al.  Software failure prediction based on a Markov Bayesian network model , 2005, J. Syst. Softw..

[23]  Jeffrey C. Carver,et al.  Evaluating the Effect of the Number of Naturally Occurring Faults on the Estimates Produced by Capture-Recapture Models , 2009, 2009 International Conference on Software Testing Verification and Validation.

[24]  Ning Chen,et al.  Puzzle-based automatic testing: bringing humans into the loop by solving puzzles , 2012, 2012 Proceedings of the 27th IEEE/ACM International Conference on Automated Software Engineering.

[25]  Sarfraz Khurshid,et al.  An Information Retrieval Approach for Regression Test Prioritization Based on Program Changes , 2015, 2015 IEEE/ACM 37th IEEE International Conference on Software Engineering.

[26]  Sashank Dara,et al.  Online Defect Prediction for Imbalanced Data , 2015, 2015 IEEE/ACM 37th IEEE International Conference on Software Engineering.

[27]  He Zhang,et al.  Towards Confidence with Capture-recapture Estimation: An Exploratory Study of Dependence within Inspections , 2017, EASE.

[28]  Talles M. G. de A. Barbosa,et al.  Affective Crowdsourcing Applied to Usability Testing , 2014 .

[29]  Junjie Wang,et al.  Who Should Be Selected to Perform a Task in Crowdsourced Testing? , 2017, 2017 IEEE 41st Annual Computer Software and Applications Conference (COMPSAC).

[30]  Yves Le Traon,et al.  Comparing White-Box and Black-Box Test Prioritization , 2016, 2016 IEEE/ACM 38th International Conference on Software Engineering (ICSE).

[31]  Jeffrey C. Carver,et al.  Application of Kusumoto cost-metric to evaluate the cost effectiveness of software inspections , 2012, Proceedings of the 2012 ACM-IEEE International Symposium on Empirical Software Engineering and Measurement.

[32]  Song Wang,et al.  Towards Effectively Test Report Classification to Assist Crowdsourced Testing , 2016, ESEM.

[33]  Song Wang,et al.  Automatically Learning Semantic Features for Defect Prediction , 2016, 2016 IEEE/ACM 38th International Conference on Software Engineering (ICSE).

[34]  Tim Menzies,et al.  On the Value of Ensemble Effort Estimation , 2012, IEEE Transactions on Software Engineering.

[35]  Michael P. Wiper,et al.  Bayesian Software Reliability Prediction Using Software Metrics Information , 2012 .

[36]  C.K.S. Chong Hok Yuen On analyzing maintenance process data at the global and the detailed levels: a case study , 1988, ICSM.

[37]  Chris F. Kemerer,et al.  An Empirical Approach to Studying Software Evolution , 1999, IEEE Trans. Software Eng..

[38]  David Zeitler Realistic assumptions for software reliability models , 1991, Proceedings. 1991 International Symposium on Software Reliability Engineering.

[39]  Baowen Xu,et al.  Test report prioritization to assist crowdsourced testing , 2015, ESEC/SIGSOFT FSE.

[40]  S. M. K. Quadri,et al.  A Software Reliability Growth Model with Two Types of Learning , 2013, 2013 International Conference on Machine Intelligence and Research Advancement.

[41]  Stephen G. Eick,et al.  Estimating software fault content before coding , 1992, International Conference on Software Engineering.

[42]  Young H. Chun,et al.  Estimating the number of undetected software errors via the correlated capture-recapture model , 2006, Eur. J. Oper. Res..

[43]  Glenford J. Myers,et al.  Art of Software Testing , 1979 .

[44]  Tsutomu Ishida,et al.  Metrics and Models in Software Quality Engineering , 1995 .

[45]  Lars Grunske,et al.  An approach to software reliability prediction based on time series modeling , 2013, J. Syst. Softw..

[46]  Pradeep Kumar Singh,et al.  A systematic review on software defect prediction , 2015, 2015 2nd International Conference on Computing for Sustainable Global Development (INDIACom).

[47]  Darko Marinov,et al.  Comparing and combining test-suite reduction and regression test selection , 2015, ESEC/SIGSOFT FSE.

[48]  Tim Menzies,et al.  "Better Data" is Better than "Better Data Miners" (Benefits of Tuning SMOTE for Defect Prediction) , 2017, ICSE.

[49]  Mark Harman,et al.  A survey of the use of crowdsourcing in software engineering , 2017, J. Syst. Softw..

[50]  Yang Feng,et al.  Multi-objective test report prioritization using image understanding , 2016, 2016 31st IEEE/ACM International Conference on Automated Software Engineering (ASE).

[51]  A. Chao Estimating the population size for capture-recapture data with unequal catchability. , 1987, Biometrics.

[52]  Anne Chao,et al.  Estimating Animal Abundance with Capture Frequency Data , 1988 .

[53]  Song Wang,et al.  QTEP: quality-aware test case prioritization , 2017, ESEC/SIGSOFT FSE.

[54]  Lionel C. Briand,et al.  A Comprehensive Evaluation of Capture-Recapture Models for Estimating Software Defect Content , 2000, IEEE Trans. Software Eng..

[55]  Jesús M. González-Barahona,et al.  Forecasting the Number of Changes in Eclipse Using Time Series Analysis , 2007, Fourth International Workshop on Mining Software Repositories (MSR'07:ICSE Workshops 2007).

[56]  Song Wang,et al.  Multi-Objective Crowd Worker Selection in Crowdsourced Testing , 2017, SEKE.

[57]  Abhinav Singh,et al.  Using Learning Styles of Software Professionals to Improve Their Inspection Team Performance , 2015, Int. J. Softw. Eng. Knowl. Eng..