TSTL: the template scripting testing language

A test harness, in automated test generation, defines the set of valid tests for a system, as well as their correctness properties. The difficulty of writing test harnesses is a major obstacle to the adoption of automated test generation and model checking. Languages for writing test harnesses are usually tied to a particular tool and unfamiliar to programmers, and often limit expressiveness. Writing test harnesses directly in the language of the software under test (SUT) is a tedious, repetitive, and error-prone task, offers little or no support for test case manipulation and debugging, and produces hard-to-read, hard-to-maintain code. Using existing harness languages or writing directly in the language of the SUT also tends to limit users to one algorithm for test generation, with little ability to explore alternative methods. In this paper, we present TSTL, the template scripting testing language, a domain-specific language (DSL) for writing test harnesses. TSTL compiles harness definitions into an interface for testing, making generic test generation and manipulation tools for all SUTs possible. TSTL includes tools for generating, manipulating, and analyzing test cases, including simple model checkers. This paper motivates TSTL via a large-scale testing effort, directed by an end-user, to find faults in the most widely used geographic information systems tool. This paper emphasizes a new approach to automated testing, where, rather than focus on developing a monolithic tool to extend, the aim is to convert a test harness into a language extension. This approach makes testing not a separate activity to be performed using a tool, but as natural to users of the language of the system under test as is the use of domain-specific libraries such as ArcPy, NumPy, or QIIME, in their domains. TSTL is a language and tool infrastructure, but is also a way to bring testing activities under the control of an existing programming language in a simple, natural way.

[1]  Alex Groce,et al.  Mini-crowdsourcing end-user assessment of intelligent assistants: A cost-benefit study , 2011, 2011 IEEE Symposium on Visual Languages and Human-Centric Computing (VL/HCC).

[2]  Alex Groce,et al.  A Little Language for Testing , 2015, NFM.

[3]  Alex Groce,et al.  Where Are My Intelligent Assistant's Mistakes? A Systematic Testing Approach , 2011, IS-EUD.

[4]  Koen Claessen,et al.  QuickCheck: a lightweight tool for random testing of Haskell programs , 2011, SIGP.

[5]  Corina S. Pasareanu,et al.  Test input generation for java containers using state matching , 2006, ISSTA '06.

[6]  John D. McGregor,et al.  Automating test case definition using a domain specific language , 2008, ACM-SE 46.

[7]  Dawson R. Engler,et al.  KLEE: Unassisted and Automatic Generation of High-Coverage Tests for Complex Systems Programs , 2008, OSDI.

[8]  Yong Lei,et al.  Minimization of randomized unit test cases , 2005, 16th IEEE International Symposium on Software Reliability Engineering (ISSRE'05).

[9]  Gregg Rothermel,et al.  Test case prioritization: an empirical study , 1999, Proceedings IEEE International Conference on Software Maintenance - 1999 (ICSM'99). 'Software Maintenance for Business Change' (Cat. No.99CB36360).

[10]  Ruth Breu,et al.  A Tool-Based Methodology for System Testing of Service-Oriented Systems , 2010, 2010 Second International Conference on Advances in System Testing and Validation Lifecycle.

[11]  Alex Groce,et al.  Cause Reduction for Quick Testing , 2014, 2014 IEEE Seventh International Conference on Software Testing, Verification and Validation.

[12]  Simeon C. Ntafos,et al.  An Evaluation of Random Testing , 1984, IEEE Transactions on Software Engineering.

[13]  Dick Hamlet When only random testing will do , 2006, RT '06.

[14]  Tim Menzies,et al.  Genetic Algorithms for Randomized Unit Testing , 2011, IEEE Transactions on Software Engineering.

[15]  Margaret M. Burnett,et al.  Garbage in, garbage out? An empirical look at oracle mistakes by end-user programmers , 2005, 2005 IEEE Symposium on Visual Languages and Human-Centric Computing (VL/HCC'05).

[16]  Alex Groce,et al.  You Are the Only Possible Oracle: Effective Test Selection for End Users of Interactive Machine Learning Systems , 2014, IEEE Transactions on Software Engineering.

[17]  Alex Groce,et al.  Random Test Run Length and Effectiveness , 2008, 2008 23rd IEEE/ACM International Conference on Automated Software Engineering.

[18]  Gerard J. Holzmann,et al.  Model-Driven Software Verification , 2004, SPIN.

[19]  Gregg Rothermel,et al.  End-user software engineering , 2004, Commun. ACM.

[20]  Alex Groce,et al.  Model driven code checking , 2008, Automated Software Engineering.

[21]  Gerard J. Holzmann,et al.  The SPIN Model Checker - primer and reference manual , 2003 .

[22]  Koushik Sen DART: Directed Automated Random Testing , 2009, Haifa Verification Conference.

[23]  R. Hamlet RANDOM TESTING , 1994 .

[24]  Wang Jinfeng,et al.  Application of Automated Testing Tool in GIS Modeling , 2009, 2009 WRI World Congress on Software Engineering.

[25]  Stefan Edelkamp,et al.  Directed explicit-state model checking in the validation of communication protocols , 2004, International Journal on Software Tools for Technology Transfer.

[26]  Gregg Rothermel,et al.  Prioritizing test cases for regression testing , 2000, ISSTA '00.

[27]  Tariq M. King,et al.  Towards Domain-Specific Testing Languages for Software-as-a-Service , 2013, MDHPCL@MoDELS.

[28]  Yong Lei,et al.  Tool support for randomized unit testing , 2006, RT '06.

[29]  Michael D. Ernst,et al.  Feedback-Directed Random Test Generation , 2007, 29th International Conference on Software Engineering (ICSE'07).

[30]  Myra B. Cohen,et al.  An orchestrated survey of methodologies for automated software test case generation , 2013, J. Syst. Softw..

[31]  Pierre Wolper,et al.  Memory-efficient algorithms for the verification of temporal properties , 1990, Formal Methods Syst. Des..

[32]  J. McCarthy A basis for a mathematical theory of computation, preliminary report , 1961, IRE-AIEE-ACM '61 (Western).

[33]  Alex Groce,et al.  Cause reduction: delta debugging, even without bugs , 2016, Softw. Test. Verification Reliab..

[34]  Robert W. Floyd,et al.  Nondeterministic Algorithms , 1967, JACM.

[35]  Gordon Fraser,et al.  Generating Readable Unit Tests for Guava , 2015, SSBSE.

[36]  Mary Beth Rosson,et al.  Design Planning in End-User Web Development , 2007 .

[37]  Gordon Fraser,et al.  EvoSuite: automatic test suite generation for object-oriented software , 2011, ESEC/FSE '11.

[38]  Alex Groce,et al.  Finding common ground: choose, assert, and assume , 2012, WODA 2012.

[39]  Gregg Rothermel,et al.  WYSIWYT testing in the spreadsheet paradigm: an empirical evaluation , 2000, Proceedings of the 2000 International Conference on Software Engineering. ICSE 2000 the New Millennium.

[40]  Martin Fowler,et al.  Domain-Specific Languages , 2010, The Addison-Wesley signature series.

[41]  Stephan Merz,et al.  Model Checking , 2000 .

[42]  Adam Ginsburg,et al.  Astropy: Community Python library for astronomy , 2013 .

[43]  Lee Pike SmartCheck: automatic and efficient counterexample reduction and generalization , 2014, Haskell '14.

[44]  Alex Groce,et al.  Swarm testing , 2012, ISSTA 2012.

[45]  Dave Astels,et al.  The RSpec Book: Behaviour Driven Development with RSpec, Cucumber, and Friends , 2010 .

[46]  Klaus Havelund,et al.  Model checking programs , 2000, Proceedings ASE 2000. Fifteenth IEEE International Conference on Automated Software Engineering.

[47]  Bertrand Meyer,et al.  Experimental assessment of random testing for object-oriented software , 2007, ISSTA '07.

[48]  Alex Groce,et al.  Comparing non-adequate test suites using coverage criteria , 2013, ISSTA.

[49]  Alex Groce,et al.  Comparing Automated Unit Testing Strategies , 2010 .

[50]  Alex Groce,et al.  Random testing and model checking: building a common framework for nondeterministic exploration , 2008, WODA '08.

[51]  Alex Groce,et al.  From scripts to specifications: the evolution of a flight software testing effort , 2010, 2010 ACM/IEEE 32nd International Conference on Software Engineering.

[52]  William A. Walters,et al.  QIIME allows analysis of high-throughput community sequencing data , 2010, Nature Methods.

[53]  Gordon Fraser,et al.  Testing Container Classes: Random or Systematic? , 2011, FASE.

[54]  Zvonimir Rakamaric,et al.  Taming test inputs for separation assurance , 2014, ASE.

[55]  Judith Segal Some Problems of Professional End User Developers , 2007 .

[56]  Lionel C. Briand,et al.  Formal analysis of the effectiveness and predictability of random testing , 2010, ISSTA '10.

[57]  Gregg Rothermel,et al.  Test case prioritization , 2004 .

[58]  Jon Louis Bentley,et al.  Programming pearls: little languages , 1986, CACM.

[59]  Alex Groce,et al.  Learning-Based Test Programming for Programmers , 2012, ISoLA.

[60]  Alex Groce,et al.  TSTL: a language and tool for testing (demo) , 2015, ISSTA.

[61]  I. K. Mak,et al.  Adaptive Random Testing , 2004, ASIAN.

[62]  Gordon Fraser,et al.  Modeling readability to improve unit tests , 2015, ESEC/SIGSOFT FSE.

[63]  Alex Groce,et al.  Lightweight Automated Testing with Adaptation-Based Programming , 2012, 2012 IEEE 23rd International Symposium on Software Reliability Engineering.

[64]  Yannis Smaragdakis,et al.  JCrasher: an automatic robustness tester for Java , 2004, Softw. Pract. Exp..

[65]  Alex Groce,et al.  Establishing flight software reliability: testing, model checking, constraint-solving, monitoring and learning , 2014, Annals of Mathematics and Artificial Intelligence.

[66]  Daniel Kroening,et al.  A Tool for Checking ANSI-C Programs , 2004, TACAS.

[67]  Alex Groce,et al.  Randomized Differential Testing as a Prelude to Formal Verification , 2007, 29th International Conference on Software Engineering (ICSE'07).

[68]  Koen Claessen,et al.  QuickCheck: a lightweight tool for random testing of Haskell programs , 2000, ICFP.

[69]  Gregg Rothermel,et al.  A methodology for testing spreadsheets , 2001, TSEM.

[70]  Margaret M. Burnett,et al.  Future of end-user software engineering: beyond the silos , 2014, FOSE.

[71]  Edsger W. Dijkstra,et al.  A Discipline of Programming , 1976 .

[72]  Andreas Zeller,et al.  Simplifying and Isolating Failure-Inducing Input , 2002, IEEE Trans. Software Eng..

[73]  Sarfraz Khurshid,et al.  Korat: A Tool for Generating Structurally Complex Test Inputs , 2007, 29th International Conference on Software Engineering (ICSE'07).

[74]  R. Joshi,et al.  Putting Flight Software Through the Paces with Testing , Model Checking , and Constraint-Solving , 2008 .

[75]  Gregg Rothermel,et al.  Software testing: a research travelogue (2000–2014) , 2014, FOSE.

[76]  Bruno Legeard,et al.  A taxonomy of model‐based testing approaches , 2012, Softw. Test. Verification Reliab..

[77]  Alex Groce,et al.  Model checking Java programs using structural heuristics , 2002, ISSTA '02.

[78]  Alex Groce,et al.  Using test case reduction and prioritization to improve symbolic execution , 2014, ISSTA 2014.

[79]  Lionel C. Briand,et al.  Adaptive random testing: an illusion of effectiveness? , 2011, ISSTA '11.

[80]  Sarfraz Khurshid,et al.  Test generation through programming in UDITA , 2010, 2010 ACM/IEEE 32nd International Conference on Software Engineering.

[81]  W. M. McKeeman,et al.  Differential Testing for Software , 1998, Digit. Tech. J..

[82]  Zhendong Su,et al.  Compiler validation via equivalence modulo inputs , 2014, PLDI.