论文信息 - Comparative Evaluation of Reinforcement Learning with Scalar Rewards and Linear Regression with Multidimensional Feedback

Comparative Evaluation of Reinforcement Learning with Scalar Rewards and Linear Regression with Multidimensional Feedback

This paper presents a comparative evaluation of two learning approaches. The first approach is a conventional reinforcement learning algorithm for direct policy search which uses scalar rewards by definition. The second approach is a custom linear regression based algorithm that uses multidimensional feedback instead of a scalar reward. The two approaches are evaluated in simulation on a common benchmark problem: an aiming task where the goal is to learn the optimal parameters for aiming that result in hitting as close as possible to a given target. The comparative evaluation shows that the multidimensional feedback provides a significant advantage over the scalar reward, resulting in an order-ofmagnitude speed-up of the convergence. A real-world experiment with a humanoid robot confirms the results from the simulation and highlights the importance of multidimensional feedback for fast learning.

Darwin G. Caldwell | Petar Kormushev

[1] Stefan Schaal,et al. Natural Actor-Critic , 2003, Neurocomputing.

[2] Andrew G. Barto,et al. Robot Weightlifting By Direct Policy Search , 2001, IJCAI.

[3] Thomas G. Dietterich. Adaptive computation and machine learning , 1998 .

[4] Richard S. Sutton,et al. Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[5] Marc Toussaint,et al. Learning model-free robot control by a Monte Carlo EM algorithm , 2009, Auton. Robots.

[6] Carl E. Rasmussen,et al. PILCO: A Model-Based and Data-Efficient Approach to Policy Search , 2011, ICML.

[7] Darwin G. Caldwell,et al. Robot motor skill coordination with EM-based Reinforcement Learning , 2010, 2010 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[8] Darwin G. Caldwell,et al. Reinforcement Learning in Robotics: Applications and Real-World Challenges , 2013, Robotics.

[9] Jan Peters,et al. Learning motor primitives for robotics , 2009, 2009 IEEE International Conference on Robotics and Automation.

[10] Tom Schaul,et al. Exploring parameter space in reinforcement learning , 2010, Paladyn J. Behav. Robotics.

[11] Jens Kober. Reinforcement Learning for Motor Primitives , 2008 .

[12] Stefan Schaal,et al. Locally Weighted Projection Regression : An O(n) Algorithm for Incremental Real Time Learning in High Dimensional Space , 2000 .

[13] R. J. Williams,et al. Simple Statistical Gradient-Following Algorithms for Connectionist Reinforcement Learning , 2004, Machine Learning.

[14] Stefan Schaal,et al. A Generalized Path Integral Control Approach to Reinforcement Learning , 2010, J. Mach. Learn. Res..

[15] Nikolaos G. Tsagarakis,et al. iCub: the design and realization of an open humanoid platform for cognitive and neuroscience research , 2007, Adv. Robotics.