Genetic Algorithms and the Automatic Generation of Test Data

Although it is well understood to be a generally undecidable problem a number of attempts have been made over the years to develop systems to automatically generate test data to achieve a level of coverage branch coverage for example These approaches have ranged at early attempts at symbolic execution to more recent dynamic approaches and despite their variety and varying degrees of success all the systems developed have involved a detailed analysis of the program or system under test In a departure from this approach this paper describes a system developed to explore the use of genetic algorithms to generate test data to automatically meet a level of coverage Genetic algorithms are commonly applied to search problems within AI They main tain a population of structures that evolve according to rules of selection mutation and reproduction Each individual in the environment receives a measure of its tness in the en vironment Reproduction selects individuals with high tness values in the population and through crossover and mutation of such individuals a new population is derived from which individuals may be even better tted to their environment Translating these concepts to the problem of test data generation the population is the set of test data each element in the set e g a group of data items used in one run of the program is an individual and the tness of an individual corresponds to the coverage it achieves of the program under test A system has been developed to support this process It takes the program to be tested currently in C and instruments it with probes to provide feedback on the coverage achieved The system creates an initial population of random data based on a description of the input data and then performs an iterative search which involves running this data and measuring its coverage and hence tness A sample of this population is selected depending on the tness value to go forward to the new population and a proportion of this new population is then subjected to mutation and crossover The process is then applied again until a maximum level of tness is reached by the test data The details of the system are described within the paper and its application demonstrated on several programs The paper concludes with an evaluation of the system so far and a plan of future work such as stopping the system from trying to nd solutions in the presence of infeasible paths