On Partially Observable Markov Decision Processes Using Genetic Algorithm Based Q-Learning

As powerful probabilistic models for optimal policy search, partially observable Markov decision processes (POMDPs) still suffer from the problems such as hidden state and uncertainty in action effects. In this paper, a novel approximate algorithm Genetic algorithm based Q-Learning (GAQ-Learning), is proposed to solve the POMDP problems. In the proposed methodology, genetic algorithms maintain a population of policy, whereas the traditional Q-learning obtains predictive rewards to indicate fitness of the evolved policies. GAQ-Learning solves the hidden state problem using a novel hybrid method, in which the historical information is incorporated with the current belief state to find the optimal policy. The experiments conducted on benchmark datasets show that the proposed methodology is superior to other state-of-the-art POMDP approximate methods.