An Initialization Method of Deep Q-network for Learning Acceleration of Robotic Grasp

Generally, self-supervised learning of robotic grasp utilizes a model-free Reinforcement Learning method, e.g., a Deep Q-network (DQN). A DQN makes use of a high-dimensional Q-network to infer dense pixel-wise probability maps of affordances for grasping actions. Unfortunately, it usually leads to a time-consuming training process. Inspired by the initialization thought of optimization algorithms, we propose a method of initialization for accelerating self-supervised learning of robotic grasp. It pre-trains the Q-network by the supervised learning of affordance maps before the robotic grasp training. When applying the pre-trained Q-network a robot can be trained through self-supervised trial-and-error in a purposeful style to avoid meaningless grasping in empty regions. The Q-network is pre-trained by supervised learning on a small dataset with coarse-grained labels. We test the proposed method with Mean Square Error, Smooth L1, and Kullback-Leibler Divergence (KLD) as loss functions in the pre-training phase. The results indicate that the KLD loss function can predict accurately affordances with less noise in the empty regions. Also, our method is able to accelerate the self-supervised learning significantly in the early stage and shows little relevance to the sparsity of objects in the workspace.

[1]  Brett Browning,et al.  A survey of robot learning from demonstration , 2009, Robotics Auton. Syst..

[2]  Sergey Levine,et al.  Continuous Deep Q-Learning with Model-based Acceleration , 2016, ICML.

[3]  Marcin Andrychowicz,et al.  One-Shot Imitation Learning , 2017, NIPS.

[4]  Maria Bauza,et al.  A probabilistic data-driven model for planar pushing , 2017, 2017 IEEE International Conference on Robotics and Automation (ICRA).

[5]  Alex Graves,et al.  Asynchronous Methods for Deep Reinforcement Learning , 2016, ICML.

[6]  Kenneth Y. Goldberg,et al.  Linear Push Policies to Increase Grasp Access for Robot Bin Picking , 2018, 2018 IEEE 14th International Conference on Automation Science and Engineering (CASE).

[7]  Shahryar Rahnamayan,et al.  A novel population initialization method for accelerating evolutionary algorithms , 2007, Comput. Math. Appl..

[8]  Alberto Rodriguez,et al.  Learning Synergies Between Pushing and Grasping with Self-Supervised Deep Reinforcement Learning , 2018, 2018 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[9]  Ian Taylor,et al.  Robotic pick-and-place of novel objects in clutter with multi-affordance grasping and cross-domain image matching , 2017, 2018 IEEE International Conference on Robotics and Automation (ICRA).

[10]  Abhinav Gupta,et al.  Supersizing self-supervision: Learning to grasp from 50K tries and 700 robot hours , 2015, 2016 IEEE International Conference on Robotics and Automation (ICRA).

[11]  Jan Peters,et al.  A Survey on Policy Search for Robotics , 2013, Found. Trends Robotics.

[12]  Yoshua Bengio,et al.  Generative Adversarial Nets , 2014, NIPS.

[13]  Sergey Levine,et al.  Trust Region Policy Optimization , 2015, ICML.

[14]  Jonathan Lee,et al.  Constraint Estimation and Derivative-Free Recovery for Robot Learning from Demonstrations , 2018, 2018 IEEE 14th International Conference on Automation Science and Engineering (CASE).

[15]  Carl E. Rasmussen,et al.  PILCO: A Model-Based and Data-Efficient Approach to Policy Search , 2011, ICML.

[16]  Anis Sahbani,et al.  An overview of 3D object grasp synthesis algorithms , 2012, Robotics Auton. Syst..

[17]  Andreas Pott,et al.  Using Neural Networks for Heuristic Grasp Planning in Random Bin Picking , 2018, 2018 IEEE 14th International Conference on Automation Science and Engineering (CASE).

[18]  Sergey Levine,et al.  Learning Hand-Eye Coordination for Robotic Grasping with Large-Scale Data Collection , 2016, ISER.

[19]  Ross B. Girshick,et al.  Mask R-CNN , 2017, 1703.06870.

[20]  Honglak Lee,et al.  Deep learning for detecting robotic grasps , 2013, Int. J. Robotics Res..

[21]  Yingwu Chen,et al.  A knowledge-based initialization technique of genetic algorithm for the travelling salesman problem , 2015, 2015 11th International Conference on Natural Computation (ICNC).

[22]  Kaiming He,et al.  Mask R-CNN , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).