Spatial Attention Point Network for Deep-learning-based Robust Autonomous Robot Motion Generation

Deep learning provides a powerful framework for automated acquisition of complex robotic motions. However, despite a certain degree of generalization, the need for vast amounts of training data depending on the work-object position is an obstacle to industrial applications. Therefore, a robot motiongeneration model that can respond to a variety of work-object positions with a small amount of training data is necessary. In this paper, we propose a method robust to changes in object position by automatically extracting spatial attention points in the image for the robot task and generating motions on the basis of their positions. We demonstrate our method with an LBR iiwa 7R1400 robot arm on a picking task and a pick-and-place task at various positions in various situations. In each task, the spatial attention points are obtained for the work objects that are important to the task. Our method is robust to changes in object position. Further, it is robust to changes in background, lighting, and obstacles that are not important to the task because it only focuses on positions that are important to the task.

[1]  Sen Jia,et al.  How Much Position Information Do Convolutional Neural Networks Encode? , 2020, ICLR.

[2]  Sergey Levine,et al.  Learning hand-eye coordination for robotic grasping with deep learning and large-scale data collection , 2016, Int. J. Robotics Res..

[3]  Sergey Levine,et al.  End-to-End Training of Deep Visuomotor Policies , 2015, J. Mach. Learn. Res..

[4]  Tetsuya Ogata,et al.  Visualization of Focal Cues for Visuomotor Coordination by Gradient-based Methods: A Recurrent Neural Network Shifts The Attention Depending on Task Requirements , 2020, 2020 IEEE/SICE International Symposium on System Integration (SII).

[5]  Jason Yosinski,et al.  An Intriguing Failing of Convolutional Neural Networks and the CoordConv Solution , 2018, NeurIPS.

[6]  Du-Ming Tsai,et al.  The evaluation of normalized cross correlations for defect detection , 2003, Pattern Recognit. Lett..

[7]  Shigeki Sugano,et al.  Repeatable Folding Task by Humanoid Robot Worker Using Deep Learning , 2017, IEEE Robotics and Automation Letters.

[8]  Honglak Lee,et al.  Deep learning for detecting robotic grasps , 2013, Int. J. Robotics Res..

[9]  Dumitru Erhan,et al.  Going deeper with convolutions , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[10]  Tao Kong,et al.  SOLOv2: Dynamic, Faster and Stronger , 2020, ArXiv.

[11]  Xinyu Liu,et al.  Dex-Net 2.0: Deep Learning to Plan Robust Grasps with Synthetic Point Clouds and Analytic Grasp Metrics , 2017, Robotics: Science and Systems.

[12]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[13]  Robert C. Bolles,et al.  Parametric Correspondence and Chamfer Matching: Two New Techniques for Image Matching , 1977, IJCAI.

[14]  Andrew Zisserman,et al.  Deep Inside Convolutional Networks: Visualising Image Classification Models and Saliency Maps , 2013, ICLR.

[15]  Sergey Levine,et al.  Deep spatial autoencoders for visuomotor learning , 2015, 2016 IEEE International Conference on Robotics and Automation (ICRA).

[16]  Geoffrey E. Hinton,et al.  Dynamic Routing Between Capsules , 2017, NIPS.