JTeC: A Large Collection of Java Test Classes for Test Code Analysis and Processing

The recent push towards test automation and test-driven development continues to scale up the dimensions of test code that needs to be maintained, analysed, and processed side-by-side with production code. As a consequence, on the one side regression testing techniques, e.g., for test suite prioritization or test case selection, capable to handle such large-scale test suites become indispensable; on the other side, as test code exposes own characteristics, specific techniques for its analysis and refactoring are actively sought. We present JTeC, a large-scale dataset of test cases that researchers can use for benchmarking the above techniques or any other type of tool expressly targeting test code. JTeC collects more than 2.5M test classes belonging to 31K+ GitHub projects and summing up to more than 430 Million SLOCs of ready-to-use real-world test code.

[1]  Georgios Gousios,et al.  The GHTorent dataset and tool suite , 2013, 2013 10th Working Conference on Mining Software Repositories (MSR).

[2]  Liming Zhu,et al.  DevOps and Its Practices , 2016, IEEE Softw..

[3]  Antonia Bertolino,et al.  Scalable Approaches for Test Suite Reduction , 2019, 2019 IEEE/ACM 41st International Conference on Software Engineering (ICSE).

[4]  John Micco,et al.  Taming Google-Scale Continuous Testing , 2017, 2017 IEEE/ACM 39th International Conference on Software Engineering: Software Engineering in Practice Track (ICSE-SEIP).

[5]  Arie van Deursen,et al.  Refactoring test code , 2001 .

[6]  Antonia Bertolino,et al.  FAST Approaches to Scalable Similarity-Based Test Case Prioritization , 2018, 2018 IEEE/ACM 40th International Conference on Software Engineering (ICSE).

[7]  Ossi Taipale,et al.  Software Test Automation in Practice: Empirical Observations , 2010, Adv. Softw. Eng..

[8]  Arie van Deursen,et al.  Automated Detection of Test Fixture Strategies and Smells , 2013, 2013 IEEE Sixth International Conference on Software Testing, Verification and Validation.

[9]  David S. Janzen,et al.  Test-driven development concepts, taxonomy, and future direction , 2005, Computer.

[10]  Vahid Garousi,et al.  Developing, Verifying, and Maintaining High-Quality Automated Test Scripts , 2016, IEEE Software.

[11]  Mario Linares Vásquez,et al.  How do Developers Test Android Applications? , 2017, 2017 IEEE International Conference on Software Maintenance and Evolution (ICSME).

[12]  Andrea De Lucia,et al.  Automatic Test Smell Detection Using Information Retrieval Techniques , 2018, 2018 IEEE International Conference on Software Maintenance and Evolution (ICSME).

[13]  Meiyappan Nagappan,et al.  A Large-Scale Study on the Usage of Testing Patterns That Address Maintainability Attributes: Patterns for Ease of Modification, Diagnoses, and Comprehension , 2017, 2017 IEEE/ACM 14th International Conference on Mining Software Repositories (MSR).

[14]  Gregg Rothermel,et al.  Techniques for improving regression testing in continuous integration development environments , 2014, SIGSOFT FSE.