论文信息 - The Effects of Randomly Sampled Training Data on Program Evolution

The Effects of Randomly Sampled Training Data on Program Evolution

The effects of randomly sampled training data on genetic programming performance is empirically investigated. Often the most natural, if not only, means of characterizing the target behaviour for a problem is to randomly sample training cases inherent to that problem. A natural question to raise about this strategy is, how deleterious is the randomly sampling of training data to evolution performance? Will sampling reduce the evolutionary search to hill climbing? Can resampling during the run be advantageous? We address these questions by undertaking a suite of different GP experiments. Parameters include various sampling strategies (single, re-sampling, ideal samples), generational and steady-state evolution, and non-evolutionary strategies such as hill climbing and random search. The experiments confirm that random sampling effectively characterizes stochastic domains during genetic programming, provided that a sufficiently representative sample is used. An unexpected result is that genetic programming may perform worse than random search when the sampled training sets are exceptionally poor. We conjecture that poor training sets cause evolution to prematurely converge to undesirable optima, which irrevocably handicaps the population's diversity and viability.

Brian J. Ross

[1] David E. Goldberg,et al. Optimal Sampling For Genetic Algorithms , 1996 .

[2] J.,et al. Edge Detection of Petrographic Images Using Genetic , 2000 .

[3] Michael Sipser,et al. Introduction to the Theory of Computation , 1996, SIGA.

[4] Pierre Dupont,et al. Regular Grammatical Inference from Positive and Negative Samples by Genetic Search: the GIG Method , 1994, ICGI.

[5] Ratnesh Kumar,et al. A probabilistic language formalism for stochastic discrete-event systems , 1999, IEEE Trans. Autom. Control..

[6] Richard K. Belew,et al. Stochastic Context-Free Grammar Induction with a Genetic Algorithm Using Local Search , 1996, FOGA.

[7] Franklin A. Graybill,et al. Introduction to The theory , 1974 .

[8] Brian J. Ross,et al. Edge detection of petrographic images using genetic programming , 2000 .

[9] Philippe Giguere. Population Sizing for Optimum Sampling with Genetic Algorithms: A Case Study of the Onemax Problem , 1998 .