Learning to Touch Objects Through Stage-Wise Deep Reinforcement Learning

Learning complex behaviors through reinforcement learning is particularly challenging when reward is only available upon successful completion of the full behavior. In manipulation robotics, so-called shaping rewards are often used to overcome this problem. However, these usually require human engineering or (partial)world models describing, e.g., the kinematics of the robot or high-level modules for perception. Here we propose an alternative method to learn an object palm-touching task through a weakly-supervised and stagewise learning of simpler tasks. First, the robot learns to fixate the object with its cameras. Second, the robot learns eye-hand coordination by learning to fixate its end effector. Third, using the previously acquired skills an informative shaping reward can be computed which facilitates efficient learning of the object palm-touching task. We demonstrate in simulation that learning the full task with this shaping reward is comparable to learning with an informative supervised reward.

[1]  Thierry Chateau,et al.  Learning of binocular fixations using anomaly detection with deep reinforcement learning , 2017, 2017 International Joint Conference on Neural Networks (IJCNN).

[2]  Sham M. Kakade,et al.  Towards Generalization and Simplicity in Continuous Control , 2017, NIPS.

[3]  Danica Kragic,et al.  Deep predictive policy training using reinforcement learning , 2017, 2017 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[4]  Pieter Abbeel,et al.  Reverse Curriculum Generation for Reinforcement Learning , 2017, CoRL.

[5]  Gaurav S. Sukhatme,et al.  Combining Model-Based and Model-Free Updates for Trajectory-Centric Reinforcement Learning , 2017, ICML.

[6]  Peter Stone,et al.  Deep Reinforcement Learning in Parameterized Action Space , 2015, ICLR.

[7]  Shane Legg,et al.  Human-level control through deep reinforcement learning , 2015, Nature.

[8]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[9]  Yuval Tassa,et al.  Continuous control with deep reinforcement learning , 2015, ICLR.

[10]  K. Fischer A theory of cognitive development: The control and construction of hierarchies of skills. , 1980 .

[11]  Nolan Wagener,et al.  Learning contact-rich manipulation skills with guided policy search , 2015, 2015 IEEE International Conference on Robotics and Automation (ICRA).

[12]  Sergey Levine,et al.  End-to-End Training of Deep Visuomotor Policies , 2015, J. Mach. Learn. Res..

[13]  Minoru Asada,et al.  Purposive Behavior Acquisition for a Real Robot by Vision-Based Reinforcement Learning , 2005, Machine Learning.

[14]  Giulio Sandini,et al.  Autonomous learning of 3D reaching in a humanoid robot , 2007, 2007 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[15]  Guy Lever,et al.  Deterministic Policy Gradient Algorithms , 2014, ICML.

[16]  Sergey Levine,et al.  Deep reinforcement learning for robotic manipulation with asynchronous off-policy updates , 2016, 2017 IEEE International Conference on Robotics and Automation (ICRA).

[17]  Trevor Darrell,et al.  Caffe: Convolutional Architecture for Fast Feature Embedding , 2014, ACM Multimedia.

[18]  Morgan Quigley,et al.  ROS: an open-source Robot Operating System , 2009, ICRA 2009.

[19]  Sergey Levine,et al.  Deep spatial autoencoders for visuomotor learning , 2015, 2016 IEEE International Conference on Robotics and Automation (ICRA).

[20]  David P. Carey,et al.  Magnetic Misreaching , 1997, Cortex.

[21]  Wolfram Schenck,et al.  Learning visuomotor transformations for gaze-control and grasping , 2005, Biological Cybernetics.

[22]  Sergey Levine,et al.  Learning hand-eye coordination for robotic grasping with deep learning and large-scale data collection , 2016, Int. J. Robotics Res..

[23]  Giorgio Metta,et al.  Early integration of vision and manipulation , 2003, Proceedings of the International Joint Conference on Neural Networks, 2003..

[24]  Marco Antonelli,et al.  Implicit Sensorimotor Mapping of the Peripersonal Space by Gazing and Reaching , 2011, IEEE Transactions on Autonomous Mental Development.

[25]  R. Held,et al.  OBSERVATIONS ON THE DEVELOPMENT OF VISUALLY-DIRECTED REACHING. , 1964, Child development.

[26]  Patricia Shaw,et al.  From Saccades to Grasping: A Model of Coordinated Reaching Through Simulated Development on a Humanoid Robot , 2014, IEEE Transactions on Autonomous Mental Development.

[27]  Takamitsu Matsubara,et al.  Deep dynamic policy programming for robot control with raw images , 2017, 2017 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[28]  Carl E. Rasmussen,et al.  Learning to Control a Low-Cost Manipulator using Data-Efficient Reinforcement Learning , 2011, Robotics: Science and Systems.

[29]  Abdeslam Boularias,et al.  Learning to Manipulate Unknown Objects in Clutter by Reinforcement , 2015, AAAI.