On the value of combining feature subset selection with genetic algorithms: faster learning of coverage models

The next challenge for the PROMISE community is scaling up and speeding up model generation to meet the size and time constraints of modern software development projects. There will always be a trade-off between completeness and runtime speed. Here we explore that trade-off in the context of using genetic algorithms to learn coverage models; i.e. biases in the control structures for randomized test generators. After applying feature subset selection to logs of the GA output, we find we can generate the coverage model and run the resulting test suite ten times faster while only losing 6% of the test case coverage.

[1]  Frederick P. Brooks,et al.  No Silver Bullet: Essence and Accidents of Software Engineering , 1987 .

[2]  Kenneth A. De Jong,et al.  An Analysis of the Interacting Roles of Population Size and Crossover in Genetic Algorithms , 1990, PPSN.

[3]  Jon Louis Bentley,et al.  K-d trees for semidynamic point sets , 1990, SCG '90.

[4]  Larry A. Rendell,et al.  A Practical Approach to Feature Selection , 1992, ML.

[5]  Igor Kononenko,et al.  Estimating Attributes: Analysis and Extensions of RELIEF , 1994, ECML.

[6]  Phyllis G. Frankl,et al.  The ASTOOT approach to testing object-oriented programs , 1994, TSEM.

[7]  Ron Kohavi,et al.  Wrappers for Feature Subset Selection , 1997, Artif. Intell..

[8]  K. Claessen,et al.  QuickCheck: a lightweight tool for random testing of Haskell programs , 2000, ICFP '00.

[9]  Richard G. Hamlet,et al.  Automatically Checking an Implementation against Its Formal Specification , 2000, IEEE Trans. Software Eng..

[10]  Gary McGraw,et al.  Generating Software Test Data by Evolution , 2001, IEEE Trans. Software Eng..

[11]  Geoff Holmes,et al.  Benchmarking Attribute Selection Techniques for Discrete Class Data Mining , 2003, IEEE Trans. Knowl. Data Eng..

[12]  Bernhard Pfahringer,et al.  Locally Weighted Naive Bayes , 2002, UAI.

[13]  Andrian Marcus,et al.  Recovering documentation-to-source-code traceability links using latent semantic indexing , 2003, 25th International Conference on Software Engineering, 2003. Proceedings..

[14]  Marko Robnik-Sikonja,et al.  Overcoming the Myopia of Inductive Learning Algorithms with RELIEFF , 2004, Applied Intelligence.

[15]  Alison Watkins,et al.  Evolutionary test data generation: a comparison of fitness functions , 2006, Softw. Pract. Exp..

[16]  John Langford,et al.  Cover trees for nearest neighbor , 2006, ICML.

[17]  Corina S. Pasareanu,et al.  Test input generation for java containers using state matching , 2006, ISSTA '06.

[18]  Tim Menzies,et al.  Nighthawk: a two-level genetic-random unit test data generator , 2007, ASE.

[19]  Michael D. Ernst,et al.  Feedback-Directed Random Test Generation , 2007, 29th International Conference on Software Engineering (ICSE'07).

[20]  Tim Menzies,et al.  Data Mining Static Code Attributes to Learn Defect Predictors , 2007, IEEE Transactions on Software Engineering.