论文信息 - Probability Redistribution using Time Hopping for Reinforcement Learning

Probability Redistribution using Time Hopping for Reinforcement Learning

A method for using the Time Hopping technique as a tool for probability redistribution is proposed. Applied to reinforcement learning in a simulation, it is able to re-shape the state probability distribution of the underlying Markov decision process as desired. This is achieved by modifying the target selection strategy of Time Hopping appropriately. Experiments with a robot maze reinforcement learning problem show that the method improves the exploration efficiency by re-shaping the state probability distribution to an almost uniform distribution.

Kaoru Hirota | Fangyan Dong | Petar Kormushev

[1] Dana H. Ballard,et al. Learning to perceive and act by trial and error , 1991, Machine Learning.

[2] Sebastian Thrun,et al. Efficient Exploration In Reinforcement Learning , 1992 .

[3] Pieter Abbeel,et al. An Application of Reinforcement Learning to Aerobatic Helicopter Flight , 2006, NIPS.

[4] Leslie Pack Kaelbling,et al. Reinforcement Learning by Policy Search , 2002 .

[5] Peter Dayan,et al. Technical Note: Q-Learning , 2004, Machine Learning.

[6] Pieter Abbeel,et al. Exploration and apprenticeship learning in reinforcement learning , 2005, ICML.

[7] Richard S. Sutton,et al. Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[8] Richard S. Sutton,et al. Learning to predict by the methods of temporal differences , 1988, Machine Learning.

[9] Jin-Woo Jung,et al. Development of Shopping Messenger (Shop-senger) for Getting More Firsthand Information , 2009 .

[10] Robert Givan,et al. Relational Reinforcement Learning: An Overview , 2004, ICML 2004.

[11] Stefan Schaal,et al. Reinforcement Learning for Humanoid Robotics , 2003 .

[12] Pieter Abbeel,et al. Learning for control from multiple demonstrations , 2008, ICML '08.

[13] Stefan Schaal,et al. Natural Actor-Critic , 2003, Neurocomputing.

[14] Andrew W. Moore,et al. Reinforcement Learning: A Survey , 1996, J. Artif. Intell. Res..

[15] Sebastian Thrun,et al. Issues in Using Function Approximation for Reinforcement Learning , 1999 .

[16] Andrew Y. Ng. Reinforcement Learning and Apprenticeship Learning for Robotic Control , 2006, Discovery Science.

[17] Sanjoy Dasgupta,et al. Off-Policy Temporal Difference Learning with Function Approximation , 2001, ICML.

[18] Kaoru Hirota,et al. Time Hopping technique for faster reinforcement learning in simulations , 2009, ArXiv.

[19] Maja J. Matarić,et al. Action Selection methods using Reinforcement Learning , 1996 .

[20] Kaoru Hirota,et al. Time manipulation technique for speeding up reinforcement learning in simulations , 2008, ArXiv.

[21] P. Dayan,et al. TD(λ) converges with probability 1 , 2004, Machine Learning.

[22] Guido Bugmann,et al. Neuro-Resistive Grid approach to trainable controllers: A pole balancing example , 1997, Neural Computing & Applications.

[23] Peter Stone,et al. Policy gradient reinforcement learning for fast quadrupedal locomotion , 2004, IEEE International Conference on Robotics and Automation, 2004. Proceedings. ICRA '04. 2004.

[24] Hidetomo Ichihashi,et al. A Study on Cluster Validation in Fuzzy Clustering Based on PCA-guided Procedure , 2009 .

[25] Shlomo Geva,et al. The Cart-Pole Experiment as a Benchmark for Trainable Controllers , 1992 .

[26] Michael Kearns,et al. Near-Optimal Reinforcement Learning in Polynomial Time , 2002, Machine Learning.

[27] Stefan Schaal,et al. Learning and generalization of motor skills by learning from demonstration , 2009, 2009 IEEE International Conference on Robotics and Automation.