论文信息 - Reinforcement learning by GA using importance sampling - 字舞流文

Reinforcement learning by GA using importance sampling

Shigenobu Kobayashi | Jun Sakuma | Hajime Kimura | Chikao Tsuchiya | H. Kimura | Shigenobu Kobayashi | J. Sakuma | Chikao Tsuchiya | S. Kobayashi

[1] Shigenobu Kobayashi,et al. Reinforcement learning of walking behavior for a four-legged robot , 2001, Proceedings of the 40th IEEE Conference on Decision and Control (Cat. No.01CH37228).

[2] Ronald J. Williams,et al. Simple Statistical Gradient-Following Algorithms for Connectionist Reinforcement Learning , 2004, Machine Learning.

[3] S. Kobayashi,et al. Theoretical analysis of the unimodal normal distribution crossover for real-coded genetic algorithms , 1998, 1998 IEEE International Conference on Evolutionary Computation Proceedings. IEEE World Congress on Computational Intelligence (Cat. No.98TH8360).

[4] Risto Miikkulainen,et al. Confidence Based Dual Reinforcement Q-Routing: An adaptive online network routing algorithm , 1999, IJCAI.

[5] Doina Precup,et al. Eligibility Traces for Off-Policy Policy Evaluation , 2000, ICML.

[6] R. Bellman. Dynamic programming. , 1957, Science.

[7] Shigenobu Kobayashi,et al. Reinforcement Learning in POMDPs with Function Approximation , 1997, ICML.

[8] Hiroaki Satoh,et al. Minimal generation gap model for GAs considering both exploration and exploitation , 1996 .

[9] Isao Ono,et al. A Real Coded Genetic Algorithm for Function Optimization Using Unimodal Normal Distributed Crossover , 1997, ICGA.

[10] Andrew G. Barto,et al. Improving Elevator Performance Using Reinforcement Learning , 1995, NIPS.

[11] Sanjoy Dasgupta,et al. Off-Policy Temporal Difference Learning with Function Approximation , 2001, ICML.

[12] Christian R. Shelton,et al. Policy Improvement for POMDPs Using Normalized Importance Sampling , 2001, UAI.

[13] Dimitri P. Bertsekas,et al. Reinforcement Learning for Dynamic Channel Allocation in Cellular Telephone Systems , 1996, NIPS.

[14] Michael L. Littman,et al. Packet Routing in Dynamically Changing Networks: A Reinforcement Learning Approach , 1993, NIPS.

[15] Leonid Peshkin,et al. Learning from Scarce Experience , 2002, ICML.