Evaluating techniques for learning a feedback controller for low-cost manipulators

Robust manipulation with tractability in unstructured environments is a prominent hurdle in robotics. Learning algorithms to control robotic arms have introduced elegant solutions to the complexities faced in such systems. A novel method of Reinforcement Learning (RL), Gaussian Process Dynamic Programming (GPDP), yields promising results for closed-loop control of a low-cost manipulator however research surrounding most RL techniques lack breadth of comparable experiments into the viability of particular learning techniques on equivalent environments. We introduce several model-based learning agents as mechanisms to control a noisy, low-cost robotic system. The agents were tested in a simulated domain for learning closed-loop policies of a simple task with no prior information. Then, the fidelity of the simulations is confirmed by application of GPDP to a physical system.

[1]  J. R. Quinlan Learning With Continuous Classes , 1992 .

[2]  H. JoséAntonioMartín,et al.  A distributed reinforcement learning control architecture for multi-link robots - experimental validation , 2007, ICINCO-ICSO.

[3]  Carl E. Rasmussen,et al.  Gaussian process dynamic programming , 2009, Neurocomputing.

[4]  Carl E. Rasmussen,et al.  Gaussian processes for machine learning , 2005, Adaptive computation and machine learning.

[5]  J. Ross Quinlan,et al.  Induction of Decision Trees , 1986, Machine Learning.

[6]  Chris Watkins,et al.  Learning from delayed rewards , 1989 .

[7]  Mahesan Niranjan,et al.  On-line Q-learning using connectionist systems , 1994 .

[8]  Csaba Szepesvári,et al.  Bandit Based Monte-Carlo Planning , 2006, ECML.

[9]  Marc Peter Deisenroth,et al.  Efficient reinforcement learning using Gaussian processes , 2010 .

[10]  Peter Stone,et al.  Real time targeted exploration in large domains , 2010, 2010 IEEE 9th International Conference on Development and Learning.

[11]  Andrea Lockerd Thomaz,et al.  Teachable robots: Understanding human teaching behavior to build more effective robot learners , 2008, Artif. Intell..

[12]  Javier de Lope,et al.  A distributed reinforcement learning control architecture for multi-link robots - experimental validation. , 2007 .

[13]  Jan Peters,et al.  Noname manuscript No. (will be inserted by the editor) Policy Search for Motor Primitives in Robotics , 2022 .

[14]  Carl E. Rasmussen,et al.  PILCO: A Model-Based and Data-Efficient Approach to Policy Search , 2011, ICML.

[15]  Siddhartha S. Srinivasa,et al.  Autonomous manipulation with a general-purpose simple hand , 2011, Int. J. Robotics Res..

[16]  Carl E. Rasmussen,et al.  Learning to Control a Low-Cost Manipulator using Data-Efficient Reinforcement Learning , 2011, Robotics: Science and Systems.

[17]  Ronen I. Brafman,et al.  R-MAX - A General Polynomial Time Algorithm for Near-Optimal Reinforcement Learning , 2001, J. Mach. Learn. Res..

[18]  Pieter Abbeel,et al.  Cloth grasp point detection based on multiple-view geometric cues with application to robotic towel folding , 2010, 2010 IEEE International Conference on Robotics and Automation.

[19]  Jason Pazis,et al.  Reinforcement learning in multidimensional continuous action spaces , 2011, 2011 IEEE Symposium on Adaptive Dynamic Programming and Reinforcement Learning (ADPRL).

[20]  Richard S. Sutton,et al.  Integrated Architectures for Learning, Planning, and Reacting Based on Approximating Dynamic Programming , 1990, ML.

[21]  Kimura Kimura Reinforcement learning in multi-dimensional state-action space using random rectangular coarse coding and gibbs sampling , 2007, SICE Annual Conference 2007.