EDA-RL: estimation of distribution algorithms for reinforcement learning problems

By making use of probabilistic models, (EDAs) can outperform conventional evolutionary computations. In this paper, EDAs are extended to solve reinforcement learning problems which arise naturally in a framework for autonomous agents. In reinforcement learning problems, we have to find out better policies of agents such that the rewards for agents in the future are increased. In general, such a policy can be represented by conditional probabilities of the agents' actions, given the perceptual inputs. In order to estimate such a conditional probability distribution, Conditional Random Fields (CRFs) by Lafferty et al. is newly introduced into EDAs in this paper. The reason for adopting CRFs is that CRFs are able to learn conditional probabilistic distributions from a large amount of input-output data, i.e., episodes in the case of reinforcement learning problems. On the other hand, conventional reinforcement learning algorithms can only learn incrementally. Computer simulations of Probabilistic Transition Problems and Perceptual Aliasing Maze Problems show the effectiveness of EDA-RL.

[1]  Siddhartha Shakya,et al.  Using a Markov network model in a univariate EDA: an empirical cost-benefit analysis , 2005, GECCO '05.

[2]  J. A. Lozano,et al.  Estimation of Distribution Algorithms: A New Tool for Evolutionary Computation , 2001 .

[3]  H. Handa,et al.  Evolutionary fuzzy systems for generating better Ms.PacMan players , 2008, 2008 IEEE International Conference on Fuzzy Systems (IEEE World Congress on Computational Intelligence).

[4]  Isao Ono,et al.  A Genetic Algorithm for Automatically Designing Modular Reinforcement Learning Agents , 2000, GECCO.

[5]  Andrew McCallum,et al.  An Introduction to Conditional Random Fields for Relational Learning , 2007 .

[6]  J. McCall,et al.  Incorporating a Metropolis method in a distribution estimation using Markov random field algorithm , 2005, 2005 IEEE Congress on Evolutionary Computation.

[7]  Roberto Santana,et al.  Estimation of Distribution Algorithms with Kikuchi Approximations , 2005, Evolutionary Computation.

[8]  Heinz Mühlenbein,et al.  FDA -A Scalable Evolutionary Algorithm for the Optimization of Additively Decomposed Functions , 1999, Evolutionary Computation.

[9]  Andrew McCallum,et al.  Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data , 2001, ICML.

[10]  Richard S. Sutton,et al.  Introduction to Reinforcement Learning , 1998 .

[11]  Martin V. Butz,et al.  Studying XCS/BOA learning in Boolean functions: structure encoding and random Boolean functions , 2006, GECCO '06.

[12]  Peter Dayan,et al.  Q-learning , 1992, Machine Learning.