Genetic algorithm methods for solving the best stationary policy of finite Markov decision processes

This paper describes a heuristic approach to solving the optimal stationary policy of the standard finite Markov decision processes (MDP). For a MDP problem, there is a so-called policy-improvement algorithm, which can be used to determine optimal policies. It starts at an arbitrary policy f/sub 0/ and produces a sequence of improvements f/sub 1/, f/sub 2/, f/sub 3/,...f/sub k/ until an optimal policy is reached. In this paper, we propose to utilize the genetic algorithm method to search the best policy that can be considered as an optimal policy. The method is a three-stage cyclic process consisting of a reproduction (selection), recombination (mating), and evaluation (survival of the fittest); and lastly, to terminate the process by setting a convergent condition. The highest fitness individual presents a best policy. In conclusion, the computational advantages of using the genetic algorithm methods are discussed.