论文信息 - Learning to Touch Objects Through Stage-Wise Deep Reinforcement Learning

Learning to Touch Objects Through Stage-Wise Deep Reinforcement Learning

Learning complex behaviors through reinforcement learning is particularly challenging when reward is only available upon successful completion of the full behavior. In manipulation robotics, so-called shaping rewards are often used to overcome this problem. However, these usually require human engineering or (partial)world models describing, e.g., the kinematics of the robot or high-level modules for perception. Here we propose an alternative method to learn an object palm-touching task through a weakly-supervised and stagewise learning of simpler tasks. First, the robot learns to fixate the object with its cameras. Second, the robot learns eye-hand coordination by learning to fixate its end effector. Third, using the previously acquired skills an informative shaping reward can be computed which facilitates efficient learning of the object palm-touching task. We demonstrate in simulation that learning the full task with this shaping reward is comparable to learning with an informative supervised reward.

Thierry Chateau | Jochen Triesch | Céline Teulière | François De La Bourdonnaye

[1] Thierry Chateau,et al. Learning of binocular fixations using anomaly detection with deep reinforcement learning , 2017, 2017 International Joint Conference on Neural Networks (IJCNN).

[2] Sham M. Kakade,et al. Towards Generalization and Simplicity in Continuous Control , 2017, NIPS.

[3] Danica Kragic,et al. Deep predictive policy training using reinforcement learning , 2017, 2017 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[4] Pieter Abbeel,et al. Reverse Curriculum Generation for Reinforcement Learning , 2017, CoRL.

[5] Gaurav S. Sukhatme,et al. Combining Model-Based and Model-Free Updates for Trajectory-Centric Reinforcement Learning , 2017, ICML.

[6] Peter Stone,et al. Deep Reinforcement Learning in Parameterized Action Space , 2015, ICLR.

[7] Shane Legg,et al. Human-level control through deep reinforcement learning , 2015, Nature.

[8] Jimmy Ba,et al. Adam: A Method for Stochastic Optimization , 2014, ICLR.

[9] Yuval Tassa,et al. Continuous control with deep reinforcement learning , 2015, ICLR.

[10] K. Fischer. A theory of cognitive development: The control and construction of hierarchies of skills. , 1980 .

[11] Nolan Wagener,et al. Learning contact-rich manipulation skills with guided policy search , 2015, 2015 IEEE International Conference on Robotics and Automation (ICRA).

[12] Sergey Levine,et al. End-to-End Training of Deep Visuomotor Policies , 2015, J. Mach. Learn. Res..

[13] Minoru Asada,et al. Purposive Behavior Acquisition for a Real Robot by Vision-Based Reinforcement Learning , 2005, Machine Learning.

[14] Giulio Sandini,et al. Autonomous learning of 3D reaching in a humanoid robot , 2007, 2007 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[15] Guy Lever,et al. Deterministic Policy Gradient Algorithms , 2014, ICML.

[16] Sergey Levine,et al. Deep reinforcement learning for robotic manipulation with asynchronous off-policy updates , 2016, 2017 IEEE International Conference on Robotics and Automation (ICRA).

[17] Trevor Darrell,et al. Caffe: Convolutional Architecture for Fast Feature Embedding , 2014, ACM Multimedia.

[18] Morgan Quigley,et al. ROS: an open-source Robot Operating System , 2009, ICRA 2009.

[19] Sergey Levine,et al. Deep spatial autoencoders for visuomotor learning , 2015, 2016 IEEE International Conference on Robotics and Automation (ICRA).

[20] David P. Carey,et al. Magnetic Misreaching , 1997, Cortex.

[21] Wolfram Schenck,et al. Learning visuomotor transformations for gaze-control and grasping , 2005, Biological Cybernetics.

[22] Sergey Levine,et al. Learning hand-eye coordination for robotic grasping with deep learning and large-scale data collection , 2016, Int. J. Robotics Res..

[23] Giorgio Metta,et al. Early integration of vision and manipulation , 2003, Proceedings of the International Joint Conference on Neural Networks, 2003..

[24] Marco Antonelli,et al. Implicit Sensorimotor Mapping of the Peripersonal Space by Gazing and Reaching , 2011, IEEE Transactions on Autonomous Mental Development.

[25] R. Held,et al. OBSERVATIONS ON THE DEVELOPMENT OF VISUALLY-DIRECTED REACHING. , 1964, Child development.

[26] Patricia Shaw,et al. From Saccades to Grasping: A Model of Coordinated Reaching Through Simulated Development on a Humanoid Robot , 2014, IEEE Transactions on Autonomous Mental Development.

[27] Takamitsu Matsubara,et al. Deep dynamic policy programming for robot control with raw images , 2017, 2017 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[28] Carl E. Rasmussen,et al. Learning to Control a Low-Cost Manipulator using Data-Efficient Reinforcement Learning , 2011, Robotics: Science and Systems.

[29] Abdeslam Boularias,et al. Learning to Manipulate Unknown Objects in Clutter by Reinforcement , 2015, AAAI.