Acquiring visual servoing reaching and grasping skills using neural reinforcement learning

In this work we present a reinforcement learning system for autonomous reaching and grasping using visual servoing with a robotic arm. Control is realized in a visual feedback control loop, making it both reactive and robust to noise. The controller is learned from scratch by success or failure without adding information about the task's solution. All of the system's major components are implemented as neural networks. The system is applied to solving a combined reaching and grasping task involving uncertainty directly on a real robotic platform. Its main parts and the conditions for their successful interoperation are described. It will be shown that even with minimal prior knowledge, the system can learn in a short amount of time to reliably perform its task. Furthermore, we describe the control system's ability to react to changes and errors.

[1]  Oliver Kroemer,et al.  Combining active learning and reactive control for robot grasping , 2010, Robotics Auton. Syst..

[2]  J. Andrew Bagnell,et al.  Robust Object Grasping using Force Compliant Motion Primitives , 2012, Robotics: Science and Systems.

[3]  Martin Jägersand,et al.  Robust Jacobian Estimation for Uncalibrated Visual Servoing , 2010, 2010 IEEE International Conference on Robotics and Automation.

[4]  Dae-Jin Kim,et al.  Eye-in-hand stereo visual servoing of an assistive robot arm in unstructured environments , 2009, 2009 IEEE International Conference on Robotics and Automation.

[5]  Emanuel Todorov,et al.  First-exit model predictive control of fast discontinuous dynamics: Application to ball bouncing , 2011, 2011 IEEE International Conference on Robotics and Automation.

[6]  Philipp Slusallek,et al.  Introduction to real-time ray tracing , 2005, SIGGRAPH Courses.

[7]  Alex M. Andrew,et al.  Visual Servoing: Real-Time Control Of Robot Manipulators Based On Visual Sensory Feedback, edited by Koichi Hashimoto, Series in Robotics and Automated Systems vol 1, World Scientific Singapore 1993, ISBN 981-02-1364-6, Hardcover, vii + 363 pp. (£56.00) , 1996, Robotica.

[8]  Stefan Schaal,et al.  Learning to grasp under uncertainty , 2011, 2011 IEEE International Conference on Robotics and Automation.

[9]  Oliver Kroemer,et al.  Learning grasp affordance densities , 2011, Paladyn J. Behav. Robotics.

[10]  Masatoshi Ishikawa,et al.  Optimal Grasping Using Visual and Tactile Feedback , 2000 .

[11]  Martin A. Riedmiller,et al.  A direct adaptive method for faster backpropagation learning: the RPROP algorithm , 1993, IEEE International Conference on Neural Networks.

[12]  Jean-Jacques E. Slotine,et al.  Experiments in Hand-Eye Coordination Using Active Vision , 1995, ISER.

[13]  Ashutosh Saxena,et al.  Reactive grasping using optical proximity sensors , 2009, 2009 IEEE International Conference on Robotics and Automation.

[14]  Antonio Morales,et al.  Vision-based three-finger grasp synthesis constrained by hand geometry , 2006, Robotics Auton. Syst..

[15]  Danica Kragic,et al.  Early reactive grasping with second order 3D feature relations , 2007 .

[16]  Oliver Kroemer,et al.  Learning visual representations for perception-action systems , 2011, Int. J. Robotics Res..

[17]  Stefan Schaal,et al.  Skill learning and task outcome prediction for manipulation , 2011, 2011 IEEE International Conference on Robotics and Automation.

[18]  Martin A. Riedmiller,et al.  High Quality Thermostat Control by Reinforcement Learning - A Case Study , 1998 .

[19]  Martin A. Riedmiller 10 Steps and Some Tricks to Set up Neural Reinforcement Controllers , 2012, Neural Networks: Tricks of the Trade.

[20]  E. Todorov,et al.  Policy gradient methods with model predictive control applied to ball bouncing , 2011 .

[21]  Oliver Kroemer,et al.  Learning Visual Representations for Interactive Systems , 2009, ISRR.

[22]  Harvey Lipkin,et al.  Uncalibrated dynamic visual servoing , 2004, IEEE Transactions on Robotics and Automation.

[23]  François Chaumette,et al.  Visual servo control. I. Basic approaches , 2006, IEEE Robotics & Automation Magazine.

[24]  Koichi Hashimoto,et al.  Visual Servoing: Real-Time Control of Robot Manipulators Based on Visual Sensory Feedback , 1993 .

[25]  Stefan Schaal,et al.  Is imitation learning the route to humanoid robots? , 1999, Trends in Cognitive Sciences.

[26]  Siddhartha S. Srinivasa,et al.  Imitation learning for locomotion and manipulation , 2007, 2007 7th IEEE-RAS International Conference on Humanoid Robots.

[27]  Geoffrey J. Gordon Stable Function Approximation in Dynamic Programming , 1995, ICML.

[28]  Stefan Schaal,et al.  Learning motion primitive goals for robust manipulation , 2011, 2011 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[29]  Peter K. Allen,et al.  Real-time visual servoing , 1991, Proceedings. 1991 IEEE International Conference on Robotics and Automation.

[30]  Seth Hutchinson,et al.  Visual Servo Control Part I: Basic Approaches , 2006 .

[31]  Siddhartha S. Srinivasa,et al.  CHOMP: Gradient optimization techniques for efficient motion planning , 2009, 2009 IEEE International Conference on Robotics and Automation.

[32]  Rodney A. Brooks,et al.  Intelligence Without Reason , 1991, IJCAI.

[33]  Helge J. Ritter,et al.  Experience-based and tactile-driven dynamic grasp control , 2007, 2007 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[34]  Martin A. Riedmiller,et al.  Neural Reinforcement Learning Controllers for a Real Robot Application , 2007, Proceedings 2007 IEEE International Conference on Robotics and Automation.

[35]  Martin A. Riedmiller,et al.  The Neuro Slot Car Racer: Reinforcement Learning in a Real World Setting , 2009, 2009 International Conference on Machine Learning and Applications.

[36]  Stefan Schaal,et al.  Online movement adaptation based on previous sensor experiences , 2011, 2011 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[37]  Jay H. Lee,et al.  Model predictive control: past, present and future , 1999 .

[38]  Martin A. Riedmiller Neural Fitted Q Iteration - First Experiences with a Data Efficient Neural Reinforcement Learning Method , 2005, ECML.

[39]  Bernard Espiau,et al.  Effect of Camera Calibration Errors on Visual Servoing in Robotics , 1993, ISER.