论文信息 - Repetitive assembly action recognition based on object detection and pose estimation

Repetitive assembly action recognition based on object detection and pose estimation

Abstract The present study employs deep learning methods to recognize repetitive assembly actions and estimate their operating times. It is intended to monitor the assembly process of workers and prevent assembly quality problems caused by the lack of key operational steps and the irregular operation of workers. Based on the characteristics of the repeatability and tool dependence of the assembly action, the recognition of the assembly action is considered as the tool object detection in the present study. Moreover, the YOLOv3 algorithm is initially applied to locate and judge the assembly tools and recognize the worker's assembly action. The present study shows that the accuracy of the action recognition is 92.8 %. Then, the pose estimation algorithm CPM based on deep learning is used to realize the recognition of human joint. Finally, the joint coordinates are extracted to judge the operating times of repetitive assembly actions. The accuracy rate of judging the operating times for repetitive assembly actions is 82.1 %.

[1] Sergey Ioffe,et al. Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift , 2015, ICML.

[2] Varun Ramakrishna,et al. Convolutional Pose Machines , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[3] Rainer Müller,et al. Reconfigurable handling systems as an enabler for large components in mass customized production , 2013, J. Intell. Manuf..

[4] Lorenzo Torresani,et al. Learning Spatiotemporal Features with 3D Convolutional Networks , 2014, 2015 IEEE International Conference on Computer Vision (ICCV).

[5] Kaiming He,et al. Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[6] Johan A. K. Suykens,et al. Least Squares Support Vector Machine Classifiers , 1999, Neural Processing Letters.

[7] Ming Yang,et al. 3D Convolutional Neural Networks for Human Action Recognition , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[8] Perry P. Gao,et al. Welding defects detection based on deep learning with multiple optical sensors during disk laser welding of thick plates , 2019, Journal of Manufacturing Systems.

[9] Wenjin Tao,et al. Worker Activity Recognition in Smart Manufacturing Using IMU and sEMG Signals with Convolutional Neural Networks , 2018, EasyChair Preprints.

[10] Alexander J. Smola,et al. Learning with Kernels: support vector machines, regularization, optimization, and beyond , 2001, Adaptive computation and machine learning series.

[11] Jinjiang Wang,et al. Machine vision intelligence for product defect inspection based on deep learning and Hough transform , 2019, Journal of Manufacturing Systems.

[12] David G. Lowe,et al. Distinctive Image Features from Scale-Invariant Keypoints , 2004, International Journal of Computer Vision.

[13] David A. McAllester,et al. Object Detection with Discriminatively Trained Part Based Models , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[14] Jure Skvarč,et al. Segmentation-based deep-learning approach for surface-defect detection , 2019, Journal of Intelligent Manufacturing.

[15] Trevor Darrell,et al. Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation , 2013, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[16] Yaser Sheikh,et al. OpenPose: Realtime Multi-Person 2D Pose Estimation Using Part Affinity Fields , 2018, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[17] Feniosky Peña-Mora,et al. Empirical assessment of a RGB-D sensor on motion capture and action recognition for construction worker monitoring , 2013 .

[18] Dazhong Wu,et al. Deep learning for smart manufacturing: Methods and applications , 2018, Journal of Manufacturing Systems.

[19] Mubarak Shah,et al. UCF101: A Dataset of 101 Human Actions Classes From Videos in The Wild , 2012, ArXiv.

[20] Ali Farhadi,et al. YOLOv3: An Incremental Improvement , 2018, ArXiv.

[21] Ali Farhadi,et al. You Only Look Once: Unified, Real-Time Object Detection , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[22] Caterina Rizzi,et al. RGB cams vs RGB-D sensors: Low cost motion capture technologies performances and limitations , 2014 .

[23] Paul A. Viola,et al. Rapid object detection using a boosted cascade of simple features , 2001, Proceedings of the 2001 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. CVPR 2001.

[24] Fabien Moutarde,et al. Gesture Recognition Using a Depth Camera for Human Robot Collaboration on Assembly Line , 2015 .

[25] Dit-Yan Yeung,et al. Convolutional LSTM Network: A Machine Learning Approach for Precipitation Nowcasting , 2015, NIPS.

[26] Tomaso A. Poggio,et al. A general framework for object detection , 1998, Sixth International Conference on Computer Vision (IEEE Cat. No.98CH36271).

[27] Bill Triggs,et al. Histograms of oriented gradients for human detection , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[28] Andrew Zisserman,et al. Flowing ConvNets for Human Pose Estimation in Videos , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[29] Andrew Zisserman,et al. Convolutional Two-Stream Network Fusion for Video Action Recognition , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[30] Ross B. Girshick,et al. Fast R-CNN , 2015, 1504.08083.

[31] Christian Szegedy,et al. DeepPose: Human Pose Estimation via Deep Neural Networks , 2013, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[32] Yoav Freund,et al. Experiments with a New Boosting Algorithm , 1996, ICML.

[33] Zhijun Zhang,et al. Human–Robot Interaction by Understanding Upper Body Gestures , 2014, PRESENCE: Teleoperators and Virtual Environments.

[34] Rainer Lienhart,et al. An extended set of Haar-like features for rapid object detection , 2002, Proceedings. International Conference on Image Processing.