Instance-Based Policy Search using Binomial Distribution Crossover and Iterated Refreshment

This paper describes a GA based lazy approach toward reinforcement learning. This approach employs data-driven policy, which is composed of an instance set and an instance-based action selector. This feature provides a number of advantages. However some difficulties remain uninvestigated. One of them is the huge and complicated search space. We have an idea that preserving characteristics of the GA population and introducing new characteristics can overcome these difficulties. On the basis of this idea, we propose two genetic operators; Binomial Distribution Crossover (BDX) and iterated refreshment. The BDX generates the descendants inheriting the parents' characteristics and the iterated refreshment introduces new characteristics greedily. The GA powered by these operators was applied to the benchmark tasks to demonstrate the ability. Each operator also was investigated and discussed from the various perspectives. Finally, we provide the preferable parameter settings for our method.