Learning accurate personal protective equipment detection from virtual worlds

Deep learning has achieved impressive results in many machine learning tasks such as image recognition and computer vision. Its applicability to supervised problems is however constrained by the availability of high-quality training data consisting of large numbers of humans annotated examples (e.g. millions). To overcome this problem, recently, the AI world is increasingly exploiting artificially generated images or video sequences using realistic photo rendering engines such as those used in entertainment applications. In this way, large sets of training images can be easily created to train deep learning algorithms. In this paper, we generated photo-realistic synthetic image sets to train deep learning models to recognize the correct use of personal safety equipment (e.g., worker safety helmets, high visibility vests, ear protection devices) during at-risk work activities. Then, we performed the adaptation of the domain to real-world images using a very small set of real-world images. We demonstrated that training with the synthetic training set generated and the use of the domain adaptation phase is an effective solution for applications where no training set is available.

[1]  Jeremiah Liu,et al.  Learning to Recognize Distance to Stop Signs Using the Virtual World of Grand Theft Auto 5 , 2017 .

[2]  Alan L. Yuille,et al.  UnrealCV: Connecting Computer Vision to Unreal Engine , 2016, ECCV Workshops.

[3]  Matthew Johnson-Roberson,et al.  Driving in the Matrix: Can virtual worlds replace human-generated annotations for real world tasks? , 2016, 2017 IEEE International Conference on Robotics and Automation (ICRA).

[4]  Zhang-Wei Hong,et al.  Virtual-to-Real: Learning to Control in Visual Semantic Segmentation , 2018, IJCAI.

[5]  Alain L. Kornhauser,et al.  Beyond Grand Theft Auto V for Training, Testing and Enhancing Deep Learning in Self Driving Cars , 2017, ArXiv.

[6]  Li Fei-Fei,et al.  ImageNet: A large-scale hierarchical image database , 2009, CVPR.

[7]  Kaiming He,et al.  Feature Pyramid Networks for Object Detection , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[8]  David Vázquez,et al.  Unsupervised domain adaptation of virtual and real worlds for pedestrian detection , 2012, Proceedings of the 21st International Conference on Pattern Recognition (ICPR2012).

[9]  David Vázquez,et al.  Learning appearance in virtual scenarios for pedestrian detection , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[10]  Vladlen Koltun,et al.  Playing for Data: Ground Truth from Computer Games , 2016, ECCV.

[11]  Kaiming He,et al.  Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[12]  Alex Bewley,et al.  Learning to Drive from Simulation without Real World Labels , 2018, 2019 International Conference on Robotics and Automation (ICRA).

[13]  Mathieu Aubry,et al.  Understanding Deep Features with Computer-Generated Imagery , 2015, ICCV.

[14]  Ali Farhadi,et al.  YOLOv3: An Incremental Improvement , 2018, ArXiv.

[15]  Antonio M. López,et al.  Virtual and Real World Adaptation for Pedestrian Detection , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[16]  Wei Liu,et al.  End-to-End Active Object Tracking and Its Real-World Deployment via Reinforcement Learning , 2018, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[17]  Michael I. Jordan,et al.  Learning Transferable Features with Deep Adaptation Networks , 2015, ICML.

[18]  Joseph Redmon,et al.  YOLOv3: An Incremental Improvement , 2018, ArXiv.

[19]  Ali Farhadi,et al.  You Only Look Once: Unified, Real-Time Object Detection , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[20]  Pietro Perona,et al.  Microsoft COCO: Common Objects in Context , 2014, ECCV.

[21]  Jorge Ordóñez-Burgos Grand theft auto , 2009 .

[22]  Claudio Gennaro,et al.  Learning Safety Equipment Detection using Virtual Worlds , 2019, 2019 International Conference on Content-Based Multimedia Indexing (CBMI).

[23]  Volker Eiselein,et al.  Training a convolutional neural network for multi-class object detection using solely virtual world data , 2016, 2016 13th IEEE International Conference on Advanced Video and Signal Based Surveillance (AVSS).

[24]  Ming-Syan Chen,et al.  VIVID: Virtual Environment for Visual Deep Learning , 2018, ACM Multimedia.

[25]  Luc Van Gool,et al.  The Pascal Visual Object Classes Challenge: A Retrospective , 2014, International Journal of Computer Vision.

[26]  Varun Jampani,et al.  Training Deep Networks with Synthetic Data: Bridging the Reality Gap by Domain Randomization , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[27]  Andrea Palazzi,et al.  Learning to Detect and Track Visible and Occluded Body Joints in a Virtual World , 2018, ECCV.

[28]  Jianxiong Xiao,et al.  DeepDriving: Learning Affordance for Direct Perception in Autonomous Driving , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).