Code Defenders: Crowdsourcing Effective Tests and Subtle Mutants with a Mutation Testing Game

Writing good software tests is difficult and not every developer's favorite occupation. Mutation testing aims to help by seeding artificial faults (mutants) that good tests should identify, and test generation tools help by providing automatically generated tests. However, mutation tools tend to produce huge numbers of mutants, many of which are trivial, redundant, or semantically equivalent to the original program, automated test generation tools tend to produce tests that achieve good code coverage, but are otherwise weak and have no clear purpose. In this paper, we present an approach based on gamification and crowdsourcing to produce better software tests and mutants: The Code Defenders web-based game lets teams of players compete over a program, where attackers try to create subtle mutants, which the defenders try to counter by writing strong tests. Experiments in controlled and crowdsourced scenarios reveal that writing tests as part of the game is more enjoyable, and that playing Code Defenders results in stronger test suites and mutants than those produced by automated tools.

[1]  Gordon Fraser,et al.  Code Defenders: A Mutation Testing Game , 2016, 2016 IEEE Ninth International Conference on Software Testing, Verification and Validation Workshops (ICSTW).

[2]  Gordon Fraser,et al.  CrowdOracles: Can the Crowd Solve the Oracle Problem? , 2013, 2013 IEEE Sixth International Conference on Software Testing, Verification and Validation.

[3]  Kathryn T. Stolee,et al.  Exploring the use of crowdsourcing to support empirical studies in software engineering , 2010, ESEM '10.

[4]  Lennart E. Nacke,et al.  From game design elements to gamefulness: defining "gamification" , 2011, MindTrek.

[5]  Leif Singer,et al.  It was a bit of a race: Gamification of version control , 2012, 2012 Second International Workshop on Games and Software Engineering: Realizing User Engagement with Game Engineering Techniques (GAS).

[6]  Giordano Tamburrelli,et al.  Understanding gamification mechanisms for software development , 2013, ESEC/FSE 2013.

[7]  Gordon Fraser,et al.  Improving search-based test suite generation with dynamic symbolic execution , 2013, 2013 IEEE 24th International Symposium on Software Reliability Engineering (ISSRE).

[8]  Manuel Blum,et al.  reCAPTCHA: Human-Based Character Recognition via Web Security Measures , 2008, Science.

[9]  Gordon Fraser,et al.  EvoSuite: automatic test suite generation for object-oriented software , 2011, ESEC/FSE '11.

[10]  Ning Chen,et al.  Puzzle-based automatic testing: bringing humans into the loop by solving puzzles , 2012, 2012 Proceedings of the 27th IEEE/ACM International Conference on Automated Software Engineering.

[11]  Georgios Gousios,et al.  When, how, and why developers (do not) test in their IDEs , 2015, ESEC/SIGSOFT FSE.

[12]  Shian-Shyong Tseng,et al.  A novel approach to collaborative testing in a crowdsourcing environment , 2013, J. Syst. Softw..

[13]  Hironori Washizaki,et al.  A Gamified Tool for Motivating Developers to Remove Warnings of Bug Pattern Tools , 2014, 2014 6th International Workshop on Empirical Software Engineering in Practice.

[14]  A. Jefferson Offutt,et al.  Automatically detecting equivalent mutants and infeasible paths , 1997, Softw. Test. Verification Reliab..

[15]  Luis von Ahn Duolingo: learn a language for free while helping to translate the web , 2013, IUI '13.

[16]  Schahram Dustdar,et al.  Incentives and rewarding in social computing , 2013, CACM.

[17]  Reid Holmes,et al.  Coverage is not strongly correlated with test suite effectiveness , 2014, ICSE.

[18]  Gordon Fraser,et al.  Semi-automatic Search-Based Test Generation , 2012, 2012 IEEE Fifth International Conference on Software Testing, Verification and Validation.

[19]  Jan Marco Leimeister,et al.  Managing crowdsourced software testing: a case study based insight on the challenges of a crowdsourcing intermediary , 2014 .

[20]  Michele Lanza,et al.  Visualizing Software Systems as Cities , 2007, 2007 4th IEEE International Workshop on Visualizing Software for Understanding and Analysis.

[21]  Gordon Fraser,et al.  Automated unit test generation during software development: a controlled experiment and think-aloud observations , 2015, ISSTA.

[22]  Michael D. Ernst,et al.  Are mutants a valid substitute for real faults in software testing? , 2014, SIGSOFT FSE.

[23]  René Just,et al.  Unit Testing Tool Competition — Round Four , 2016, 2016 IEEE/ACM 9th International Workshop on Search-Based Software Testing (SBST).

[24]  Mark Harman,et al.  A survey of the use of crowdsourcing in software engineering , 2017, J. Syst. Softw..

[25]  Claes Wohlin,et al.  Using Students as Subjects—A Comparative Study of Students and Professionals in Lead-Time Impact Assessment , 2000, Empirical Software Engineering.

[26]  Gordon Fraser,et al.  Modeling readability to improve unit tests , 2015, ESEC/SIGSOFT FSE.

[27]  Karim R. Lakhani,et al.  TopCoder (A): Developing Software through Crowdsourcing , 2010 .

[28]  A. Jefferson Offutt,et al.  Automatically detecting equivalent mutants and infeasible paths , 1997 .

[29]  Nikolai Tillmann,et al.  Pex-White Box Test Generation for .NET , 2008, TAP.

[30]  R. Nigel Horspool,et al.  Code Hunt: Experience with Coding Contests at Scale , 2015, 2015 IEEE/ACM 37th IEEE International Conference on Software Engineering.

[31]  Michael D. Ernst,et al.  Feedback-Directed Random Test Generation , 2007, 29th International Conference on Software Engineering (ICSE'07).

[32]  Gordon Fraser,et al.  Do Automatically Generated Unit Tests Find Real Faults? An Empirical Study of Effectiveness and Challenges (T) , 2015, 2015 30th IEEE/ACM International Conference on Automated Software Engineering (ASE).

[33]  René Just,et al.  The major mutation framework: efficient and scalable mutation analysis for Java , 2014, ISSTA 2014.

[34]  Mario Piattini,et al.  Gamification in software engineering - A systematic mapping , 2015, Inf. Softw. Technol..

[35]  Nikolai Tillmann,et al.  Pex4Fun: Teaching and learning computer science via social gaming , 2011, 2011 24th IEEE-CS Conference on Software Engineering Education and Training (CSEE&T).

[36]  Dana Angluin,et al.  Two notions of correctness and their relation to testing , 1982, Acta Informatica.

[37]  Michael D. Ernst,et al.  Efficient mutation analysis by propagating and partitioning infected execution states , 2014, ISSTA 2014.

[38]  Nikolai Tillmann,et al.  Precise identification of problems for structural test generation , 2011, 2011 33rd International Conference on Software Engineering (ICSE).

[39]  Tibor Gyimóthy,et al.  Using the City Metaphor for Visualizing Test-Related Metrics , 2016, 2016 IEEE 23rd International Conference on Software Analysis, Evolution, and Reengineering (SANER).

[40]  Andreas Zeller,et al.  Covering and Uncovering Equivalent Mutants , 2013, Softw. Test. Verification Reliab..

[41]  Lionel C. Briand,et al.  Is mutation an appropriate tool for testing experiments? , 2005, ICSE.

[42]  Jeffrey C. Carver,et al.  Issues in using students in empirical studies in software engineering education , 2003, Proceedings. 5th International Workshop on Enterprise Networking and Computing in Healthcare Industry (IEEE Cat. No.03EX717).

[43]  GORDON FRASER,et al.  A Large-Scale Evaluation of Automated Unit Test Generation Using EvoSuite , 2014, ACM Trans. Softw. Eng. Methodol..

[44]  Mark Harman,et al.  How to Overcome the Equivalent Mutant Problem and Achieve Tailored Selective Mutation Using Co-evolution , 2004, GECCO.

[45]  Sebastian G. Elbaum,et al.  Bug Hunt: Making Early Software Testing Lessons Engaging and Affordable , 2007, 29th International Conference on Software Engineering (ICSE'07).

[46]  Yves Le Traon,et al.  Threats to the validity of mutation-based test assessment , 2016, ISSTA.

[47]  Michael D. Ernst,et al.  Verification games: making verification fun , 2012, FTfJP@ECOOP.

[48]  Jonathan Bell,et al.  Secret ninja testing with HALO software engineering , 2011, SSE '11.

[49]  Yves Le Traon,et al.  Trivial Compiler Equivalence: A Large Scale Empirical Study of a Simple, Fast and Effective Equivalent Mutant Detection Technique , 2015, 2015 IEEE/ACM 37th IEEE International Conference on Software Engineering.

[50]  Gordon Fraser,et al.  Teaching Mutation Testing using Gamification , 2016 .

[51]  Mark Harman,et al.  A study of equivalent and stubborn mutation operators using human analysis of equivalence , 2014, ICSE.