A continuous estimation of distribution algorithm by evolving graph structures using reinforcement learning

A novel graph-based Estimation of Distribution Algorithm (EDA) named Probabilistic Model Building Genetic Network Programming (PMBGNP) has been proposed. Inspired by classical EDAs, PMBGNP memorizes the current best individuals and uses them to estimate a distribution for the generation of the new population. However, PMBGNP can evolve compact programs by representing its solutions as graph structures. Therefore, it can solve a range of problems different from conventional ones in EDA literature, such as data mining and Reinforcement Learning (RL) problems. This paper extends PMBGNP from discrete to continuous search space, which is named PMBGNP-AC. Besides evolving the node connections to determine the optimal graph structures using conventional PMBGNP, Gaussian distribution is used for the distribution of continuous variables of nodes. The mean value μ and standard deviation σ are constructed like those of classical continuous Population-based incremental learning (PBILc). However, a RL technique, i.e., Actor-Critic (AC), is designed to update the parameters (μ and σ). AC allows us to calculate the Temporal-Difference (TD) error to evaluate whether the selection of the continuous value is better or worse than expected. This scalar reinforcement signal can decide whether the tendency to select this continuous value should be strengthened or weakened, allowing us to determine the shape of the probability density functions of the Gaussian distribution. The proposed algorithm is applied to a RL problem, i.e., autonomous robot control, where the robot's wheel speeds and sensor values are continuous. The experimental results show the superiority of PMBGNP-AC comparing with the conventional algorithms.

[1]  Martin Pelikan Probabilistic model-building genetic algorithms , 2007, GECCO '07.

[2]  Shingo Mabu,et al.  Stock trading rules using genetic network programming with actor-critic , 2007, 2007 IEEE Congress on Evolutionary Computation.

[3]  David E. Goldberg,et al.  A Survey of Optimization by Building and Using Probabilistic Models , 2002, Comput. Optim. Appl..

[4]  Rafal Salustowicz,et al.  Probabilistic Incremental Program Evolution , 1997, Evolutionary Computation.

[5]  Dirk Thierens,et al.  Numerical Optimization with Real-Valued Estimation-of-Distribution Algorithms , 2006, Scalable Optimization via Probabilistic Modeling.

[6]  Shumeet Baluja,et al.  A Method for Integrating Genetic Search Based Function Optimization and Competitive Learning , 1994 .

[7]  David E. Goldberg,et al.  Linkage Problem, Distribution Estimation, and Bayesian Networks , 2000, Evolutionary Computation.

[8]  Shingo Mabu,et al.  A novel estimation of distribution algorithm using graph-based chromosome representation and reinforcement learning , 2011, 2011 IEEE Congress of Evolutionary Computation (CEC).

[9]  Shingo Mabu,et al.  A Graph-Based Evolutionary Algorithm: Genetic Network Programming (GNP) and Its Extension Using Reinforcement Learning , 2007, Evolutionary Computation.

[10]  Kotaro Hirasawa,et al.  Comparison between Genetic Network Programming (GNP) and Genetic Programming (GP) , 2001, Proceedings of the 2001 Congress on Evolutionary Computation (IEEE Cat. No.01TH8546).

[11]  Shingo Mabu,et al.  Use of infeasible individuals in probabilistic model building genetic network programming , 2011, GECCO '11.

[12]  Shingo Mabu,et al.  Genetic Network Programming with Estimation of Distribution Algorithms for class association rule mining in traffic prediction , 2010, IEEE Congress on Evolutionary Computation.

[13]  Michèle Sebag,et al.  Extending Population-Based Incremental Learning to Continuous Search Spaces , 1998, PPSN.

[14]  Kotaro Hirasawa,et al.  A study of evolutionary multiagent models based on symbiosis , 2006, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).

[15]  Hussein A. Abbass,et al.  A Survey of Probabilistic Model Building Genetic Programming , 2006, Scalable Optimization via Probabilistic Modeling.

[16]  Dirk Thierens,et al.  Expanding from Discrete to Continuous Estimation of Distribution Algorithms: The IDEA , 2000, PPSN.

[17]  Shingo Mabu,et al.  Genetic Network Programming with Estimation of Distribution Algorithms for class association rule mining in traffic prediction , 2010, IEEE Congress on Evolutionary Computation.

[18]  Peter Nordin,et al.  Evolution of a world model for a miniature robot using genetic programming , 1998, Robotics Auton. Syst..

[19]  Xin Yao,et al.  NichingEDA: Utilizing the diversity inside a population of EDAs for continuous optimization , 2008, 2008 IEEE Congress on Evolutionary Computation (IEEE World Congress on Computational Intelligence).

[20]  Xin Yao,et al.  Clustering and learning Gaussian distribution for continuous optimization , 2005, IEEE Trans. Syst. Man Cybern. Part C.

[21]  Jinglu Hu,et al.  Genetic network programming - application to intelligent agents , 2000, Smc 2000 conference proceedings. 2000 ieee international conference on systems, man and cybernetics. 'cybernetics evolving to systems, humans, organizations, and their complex interactions' (cat. no.0.

[22]  Pedro Larrañaga,et al.  Estimation of Distribution Algorithms , 2002, Genetic Algorithms and Evolutionary Computation.

[23]  H. Mühlenbein,et al.  From Recombination of Genes to the Estimation of Distributions I. Binary Parameters , 1996, PPSN.