Reinforcement learning by GA using importance sampling

[1]  Shigenobu Kobayashi,et al.  Reinforcement learning of walking behavior for a four-legged robot , 2001, Proceedings of the 40th IEEE Conference on Decision and Control (Cat. No.01CH37228).

[2]  Ronald J. Williams,et al.  Simple Statistical Gradient-Following Algorithms for Connectionist Reinforcement Learning , 2004, Machine Learning.

[3]  S. Kobayashi,et al.  Theoretical analysis of the unimodal normal distribution crossover for real-coded genetic algorithms , 1998, 1998 IEEE International Conference on Evolutionary Computation Proceedings. IEEE World Congress on Computational Intelligence (Cat. No.98TH8360).

[4]  Risto Miikkulainen,et al.  Confidence Based Dual Reinforcement Q-Routing: An adaptive online network routing algorithm , 1999, IJCAI.

[5]  Doina Precup,et al.  Eligibility Traces for Off-Policy Policy Evaluation , 2000, ICML.

[6]  R. Bellman Dynamic programming. , 1957, Science.

[7]  Shigenobu Kobayashi,et al.  Reinforcement Learning in POMDPs with Function Approximation , 1997, ICML.

[8]  Hiroaki Satoh,et al.  Minimal generation gap model for GAs considering both exploration and exploitation , 1996 .

[9]  Isao Ono,et al.  A Real Coded Genetic Algorithm for Function Optimization Using Unimodal Normal Distributed Crossover , 1997, ICGA.

[10]  Andrew G. Barto,et al.  Improving Elevator Performance Using Reinforcement Learning , 1995, NIPS.

[11]  Sanjoy Dasgupta,et al.  Off-Policy Temporal Difference Learning with Function Approximation , 2001, ICML.

[12]  Christian R. Shelton,et al.  Policy Improvement for POMDPs Using Normalized Importance Sampling , 2001, UAI.

[13]  Dimitri P. Bertsekas,et al.  Reinforcement Learning for Dynamic Channel Allocation in Cellular Telephone Systems , 1996, NIPS.

[14]  Michael L. Littman,et al.  Packet Routing in Dynamically Changing Networks: A Reinforcement Learning Approach , 1993, NIPS.

[15]  Leonid Peshkin,et al.  Learning from Scarce Experience , 2002, ICML.