This paper describes a heuristic approach to solving the optimal stationary policy of the standard finite Markov decision processes (MDP). For a MDP problem, there is a so-called policy-improvement algorithm, which can be used to determine optimal policies. It starts at an arbitrary policy f/sub 0/ and produces a sequence of improvements f/sub 1/, f/sub 2/, f/sub 3/,...f/sub k/ until an optimal policy is reached. In this paper, we propose to utilize the genetic algorithm method to search the best policy that can be considered as an optimal policy. The method is a three-stage cyclic process consisting of a reproduction (selection), recombination (mating), and evaluation (survival of the fittest); and lastly, to terminate the process by setting a convergent condition. The highest fitness individual presents a best policy. In conclusion, the computational advantages of using the genetic algorithm methods are discussed.
[1]
S. Lippman,et al.
Stochastic Games with Perfect Information and Time Average Payoff
,
1969
.
[2]
D. Blackwell.
Discrete Dynamic Programming
,
1962
.
[3]
Anne Brindle,et al.
Genetic algorithms for function optimization
,
1980
.
[4]
Cyrus Derman,et al.
Finite State Markovian Decision Processes
,
1970
.
[5]
L. C. M. Kallenberg,et al.
Linear programming and finite Markovian control problems
,
1984
.
[6]
Dean Gillette,et al.
9. STOCHASTIC GAMES WITH ZERO STOP PROBABILITIES
,
1958
.