Probability Redistribution using Time Hopping for Reinforcement Learning
暂无分享,去创建一个
[1] Dana H. Ballard,et al. Learning to perceive and act by trial and error , 1991, Machine Learning.
[2] Sebastian Thrun,et al. Efficient Exploration In Reinforcement Learning , 1992 .
[3] Pieter Abbeel,et al. An Application of Reinforcement Learning to Aerobatic Helicopter Flight , 2006, NIPS.
[4] Leslie Pack Kaelbling,et al. Reinforcement Learning by Policy Search , 2002 .
[5] Peter Dayan,et al. Technical Note: Q-Learning , 2004, Machine Learning.
[6] Pieter Abbeel,et al. Exploration and apprenticeship learning in reinforcement learning , 2005, ICML.
[7] Richard S. Sutton,et al. Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.
[8] Richard S. Sutton,et al. Learning to predict by the methods of temporal differences , 1988, Machine Learning.
[9] Jin-Woo Jung,et al. Development of Shopping Messenger (Shop-senger) for Getting More Firsthand Information , 2009 .
[10] Robert Givan,et al. Relational Reinforcement Learning: An Overview , 2004, ICML 2004.
[11] Stefan Schaal,et al. Reinforcement Learning for Humanoid Robotics , 2003 .
[12] Pieter Abbeel,et al. Learning for control from multiple demonstrations , 2008, ICML '08.
[13] Stefan Schaal,et al. Natural Actor-Critic , 2003, Neurocomputing.
[14] Andrew W. Moore,et al. Reinforcement Learning: A Survey , 1996, J. Artif. Intell. Res..
[15] Sebastian Thrun,et al. Issues in Using Function Approximation for Reinforcement Learning , 1999 .
[16] Andrew Y. Ng. Reinforcement Learning and Apprenticeship Learning for Robotic Control , 2006, Discovery Science.
[17] Sanjoy Dasgupta,et al. Off-Policy Temporal Difference Learning with Function Approximation , 2001, ICML.
[18] Kaoru Hirota,et al. Time Hopping technique for faster reinforcement learning in simulations , 2009, ArXiv.
[19] Maja J. Matarić,et al. Action Selection methods using Reinforcement Learning , 1996 .
[20] Kaoru Hirota,et al. Time manipulation technique for speeding up reinforcement learning in simulations , 2008, ArXiv.
[21] P. Dayan,et al. TD(λ) converges with probability 1 , 2004, Machine Learning.
[22] Guido Bugmann,et al. Neuro-Resistive Grid approach to trainable controllers: A pole balancing example , 1997, Neural Computing & Applications.
[23] Peter Stone,et al. Policy gradient reinforcement learning for fast quadrupedal locomotion , 2004, IEEE International Conference on Robotics and Automation, 2004. Proceedings. ICRA '04. 2004.
[24] Hidetomo Ichihashi,et al. A Study on Cluster Validation in Fuzzy Clustering Based on PCA-guided Procedure , 2009 .
[25] Shlomo Geva,et al. The Cart-Pole Experiment as a Benchmark for Trainable Controllers , 1992 .
[26] Michael Kearns,et al. Near-Optimal Reinforcement Learning in Polynomial Time , 2002, Machine Learning.
[27] Stefan Schaal,et al. Learning and generalization of motor skills by learning from demonstration , 2009, 2009 IEEE International Conference on Robotics and Automation.