Assembly Manipulation Understanding Based on 3D Object Pose Estimation and Human Motion Estimation

In this paper, we present a method of assembly manipulation understanding by demonstration. Human demonstrator performs assembly manipulation in front of a 3D range camera system, then the system recognizes each assembly manipulation from two aspects: 3D object poses and hand motion. The pose of assembled parts are calculated by means of LINEMOD that one of template matching methods. Meanwhile, hand motion feature is extracted from hand joints motion. They are extracted by means of OpenPose that is one of human pose estimation method. We combine both features to obtain more robust features and to calculate probability of each action classes. Finally, we confirmed that the proposed method enables to recognize assembly manipulation as highest probability action class.

[1]  Dariu Gavrila,et al.  Real-time object detection for "smart" vehicles , 1999, Proceedings of the Seventh IEEE International Conference on Computer Vision.

[2]  Vincent Lepetit,et al.  Gradient Response Maps for Real-Time Detection of Textureless Objects , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[3]  Kensuke Harada,et al.  Teaching Robots to Do Object Assembly using Multi-modal 3D Vision , 2016, Neurocomputing.

[4]  Anuj Srivastava,et al.  Accurate 3D action recognition using learning on the Grassmann manifold , 2015, Pattern Recognit..

[5]  C. Steger OCCLUSION , CLUTTER , AND ILLUMINATION INVARIANT OBJECT RECOGNITION , 2002 .

[6]  Vincent Lepetit,et al.  Model Based Training, Detection and Pose Estimation of Texture-Less 3D Objects in Heavily Cluttered Scenes , 2012, ACCV.

[7]  Juarez Monteiro,et al.  Deep neural networks for kitchen activity recognition , 2017, 2017 International Joint Conference on Neural Networks (IJCNN).

[8]  Tamim Asfour,et al.  Programming by demonstration: dual-arm manipulation tasks for humanoid robots , 2004, 2004 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) (IEEE Cat. No.04CH37566).

[9]  Paolo Napoletano,et al.  Cooking Action Recognition with iVAT: An Interactive Video Annotation Tool , 2013, ICIAP.

[10]  Gordon Cheng,et al.  Automatic segmentation and recognition of human activities from observation based on semantic reasoning , 2014, 2014 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[11]  René Vidal,et al.  Deep Moving Poselets for Video Based Action Recognition , 2017, 2017 IEEE Winter Conference on Applications of Computer Vision (WACV).

[12]  Masayuki Inaba,et al.  Learning by watching: extracting reusable task knowledge from visual observation of human performance , 1994, IEEE Trans. Robotics Autom..

[13]  Katsushi Ikeuchi,et al.  Toward an assembly plan from observation. I. Task recognition with polyhedral objects , 1994, IEEE Trans. Robotics Autom..

[14]  Leonid Sigal,et al.  Poselet Key-Framing: A Model for Human Activity Recognition , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[15]  Matthijs C. Dorst Distinctive Image Features from Scale-Invariant Keypoints , 2011 .