Assembly Motion Recognition Framework Using Only Images

This work proposes a method for recognizing and segmenting assembly tasks into single motions. First, using a motion capture system based on pose estimation from multiple points, we obtain a time series data of the human’s motion during an assembly task (motion data). We use an object detector algorithm to determine the assembly parts and tools that the user (human) is grasping. Then, we divide (segment) the assembly motion based on the change of the manipulated object and the velocity of the hand. We carry out the motion recognition of the segmented motion data by using several Hidden Markov Models (HMMs) that represent the actions that can be executed with the manipulated object(s). We recorded the assembly motion of an airplane toy done by two experts for training the HMMs and recorded the assembly motion of five subjects to verify the validity of the proposed method.

[1]  Kensuke Harada,et al.  Recognition of Assembly Tasks Based on the Actions Associated to the Manipulated Objects , 2019, 2019 IEEE/SICE International Symposium on System Integration (SII).

[2]  Eren Erdal Aksoy,et al.  Model-free incremental learning of the semantics of manipulation actions , 2015, Robotics Auton. Syst..

[3]  Ali Farhadi,et al.  You Only Look Once: Unified, Real-Time Object Detection , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[4]  Varun Ramakrishna,et al.  Convolutional Pose Machines , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[5]  Laurel D. Riek,et al.  Activity recognition in manufacturing: The roles of motion capture and sEMG+inertial wearables in detecting fine vs. gross motion , 2019, 2019 International Conference on Robotics and Automation (ICRA).

[6]  Serena Ivaldi,et al.  Activity Recognition for Ergonomics Assessment of Industrial Tasks With Automatic Feature Selection , 2019, IEEE Robotics and Automation Letters.

[7]  Masamichi Shimosaka,et al.  Online recognition and segmentation for time-series motion with HMM and conceptual relation of actions , 2005, 2005 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[8]  Yaser Sheikh,et al.  Hand Keypoint Detection in Single Images Using Multiview Bootstrapping , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[9]  Andrew J. Viterbi,et al.  Error bounds for convolutional codes and an asymptotically optimum decoding algorithm , 1967, IEEE Trans. Inf. Theory.

[10]  Lawrence R. Rabiner,et al.  A tutorial on hidden Markov models and selected applications in speech recognition , 1989, Proc. IEEE.

[11]  Yaser Sheikh,et al.  OpenPose: Realtime Multi-Person 2D Pose Estimation Using Part Affinity Fields , 2018, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[12]  L. Baum,et al.  Statistical Inference for Probabilistic Functions of Finite State Markov Chains , 1966 .

[13]  Yaser Sheikh,et al.  OpenPose: Realtime Multi-Person 2D Pose Estimation Using Part Affinity Fields , 2018, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[14]  Kensuke Harada,et al.  A framework for systematic accumulation, sharing and reuse of task implementation knowledge , 2016, 2016 IEEE/SICE International Symposium on System Integration (SII).

[15]  Shuichi Akizuki,et al.  A brief review of affordance in robotic manipulation research , 2017, Adv. Robotics.