A distributed implementation using apache spark of a genetic algorithm applied to test data generation

This paper presents a distributed implementation for a genetic algorithm, using Apache Spark, a fast and popular data processing framework. Our approach is rather general, but in this paper the parallelized genetic algorithm is used for test data generation for executable programs. The viability of the approach is demonstrated on two examples.

[1]  Gordon Fraser,et al.  Combining search-based and constraint-based testing , 2011, 2011 26th IEEE/ACM International Conference on Automated Software Engineering (ASE 2011).

[2]  Patrice Godefroid Random testing for security: blackbox vs. whitebox fuzzing , 2007, RT '07.

[3]  Dirk Thierens,et al.  Selection Schemes, Elitist Recombination, and Selection Intensity , 1997, ICGA.

[4]  Phil McMinn,et al.  Search-Based Software Testing: Past, Present and Future , 2011, 2011 IEEE Fourth International Conference on Software Testing, Verification and Validation Workshops.

[5]  Annie S. Wu,et al.  Putting More Genetics into Genetic Algorithms , 1998, Evolutionary Computation.

[6]  Dirk Sudholt,et al.  Parallel Evolutionary Algorithms , 2015, Handbook of Computational Intelligence.

[7]  Enrique Alba,et al.  Enhancing distributed EAs by a proactive strategy , 2014, Cluster Computing.

[8]  Lalit M. Patnaik,et al.  Genetic algorithms: a survey , 1994, Computer.

[9]  Koushik Sen,et al.  Symbolic execution for software testing: three decades later , 2013, CACM.

[10]  Giancarlo Mauri,et al.  An empirical comparison of parallel and distributed particle swarm optimization methods , 2010, GECCO '10.

[11]  Sebastián Ventura,et al.  GPU-parallel subtree interpreter for genetic programming , 2014, GECCO.

[12]  Tianlong Man,et al.  RGA: A lightweight and effective regeneration genetic algorithm for coverage-oriented software test data generation , 2016, Inf. Softw. Technol..

[13]  Yuanyuan Zhang,et al.  Achievements, Open Problems and Challenges for Search Based Software Testing , 2015, 2015 IEEE 8th International Conference on Software Testing, Verification and Validation (ICST).

[14]  Gordon Fraser,et al.  Improving search-based test suite generation with dynamic symbolic execution , 2013, 2013 IEEE 24th International Symposium on Software Reliability Engineering (ISSRE).

[15]  Roy P. Pargas,et al.  Test‐data generation using genetic algorithms , 1999 .

[16]  Michael J. Franklin,et al.  Resilient Distributed Datasets: A Fault-Tolerant Abstraction for In-Memory Cluster Computing , 2012, NSDI.

[17]  Florence March,et al.  2016 , 2016, Affair of the Heart.

[18]  Lionel C. Briand,et al.  A practical guide for using statistical tests to assess randomized algorithms in software engineering , 2011, 2011 33rd International Conference on Software Engineering (ICSE).

[19]  Lars Grunske,et al.  Test data generation with a Kalman filter-based adaptive genetic algorithm , 2015, J. Syst. Softw..

[20]  Dun-Wei Gong,et al.  Genetic Algorithm-Based Test Data Generation for Multiple Paths via Individual Sharing , 2014, Comput. Intell. Neurosci..