Dataset and benchmark for detecting moving objects in construction sites

Abstract Detecting workers and equipment through images/videos can assist in safety monitoring, quality control, and productivity management at construction sites. Currently, the dominant method for detecting is Deep Neural Networks (DNNs). To apply this method, the DNNs always need to be trained on image datasets that contain objects at the construction site. However, a large-scale and publicly available image dataset for detecting objects at construction sites is still absent, and this hinders research in this field. In this study, the Moving Objects in Construction Sites (MOCS) image dataset is presented. The dataset contains 41,668 images collected from 174 different construction sites. Thirteen categories of moving objects found in construction sites were annotated. Furthermore, the objects were precisely annotated using per-pixel segmentations to assist in precise object localization. A detailed statistical analysis was performed in this study. Finally, a benchmark containing 15 different DNN-based detectors was made using the MOCS dataset. The results show that all detectors trained on the dataset could detect objects at construction sites precisely and robustly.

[1]  Gary A. Atkinson,et al.  Image segmentation of underfloor scenes using a mask regions convolutional neural network with two-stage transfer learning , 2020 .

[2]  Yang Xu,et al.  Dense construction vehicle detection based on orientation-aware feature fusion convolutional neural network , 2020 .

[3]  Trevor Slaton,et al.  Construction activity recognition with convolutional recurrent networks , 2020, Automation in Construction.

[4]  Kaiming He,et al.  Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[5]  Hideki Hashimoto,et al.  Circle Fitting Based Pile Positioning and Machine Pose Estimation from Range Data for Pile Driver Navigation , 2012, SyRoCo.

[6]  Zhuowen Tu,et al.  Aggregated Residual Transformations for Deep Neural Networks , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[7]  Jitendra Malik,et al.  Simultaneous Detection and Segmentation , 2014, ECCV.

[8]  Luc Van Gool,et al.  The Pascal Visual Object Classes Challenge: A Retrospective , 2014, International Journal of Computer Vision.

[9]  Ross B. Girshick,et al.  Focal Loss for Dense Object Detection , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[10]  Vineet R. Kamat,et al.  Remote proximity monitoring between mobile construction resources using camera-mounted UAVs , 2019, Automation in Construction.

[11]  Pietro Perona,et al.  Microsoft COCO: Common Objects in Context , 2014, ECCV.

[12]  Yang Yang,et al.  Instance-level recognition and quantification for concrete surface bughole based on deep learning , 2019, Automation in Construction.

[13]  Yongchao Gong,et al.  Mask Scoring R-CNN , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[14]  Sanja Fidler,et al.  Instance-Level Segmentation for Autonomous Driving with Deep Densely Connected MRFs , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[15]  Kaiming He,et al.  Feature Pyramid Networks for Object Detection , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[16]  Bill Triggs,et al.  Histograms of oriented gradients for human detection , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[17]  Quoc V. Le,et al.  NAS-FPN: Learning Scalable Feature Pyramid Architecture for Object Detection , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[18]  Hyeran Byun,et al.  Detecting Construction Equipment Using a Region-Based Fully Convolutional Network and Transfer Learning , 2018, J. Comput. Civ. Eng..

[19]  Yanming Li,et al.  Improved Visual Hook Capturing and Tracking for Precision Hoisting of Tower Crane , 2013 .

[20]  Katsushi Ikeuchi,et al.  An Accurate and Efficient Pile Driver Positioning System Using Laser Range Finder , 2012, WDIA.

[21]  Philip H. S. Torr,et al.  Recurrent Instance Segmentation , 2015, ECCV.

[22]  Zhenhua Zhu,et al.  Image dataset development for measuring construction equipment recognition performance , 2014 .

[23]  Mani Golparvar-Fard,et al.  End-to-end vision-based detection, tracking and activity analysis of earthmoving equipment filmed at ground level , 2019, Automation in Construction.

[24]  Mani Golparvar-Fard,et al.  Segmentation and recognition of roadway assets from car-mounted camera video streams using a scalable non-parametric image parsing method , 2015 .

[25]  Xiaochun Luo,et al.  Recognizing Diverse Construction Activities in Site Images via Relevance Networks of Construction-Related Objects Detected by Convolutional Neural Networks , 2018, J. Comput. Civ. Eng..

[26]  Jie Gong,et al.  An object recognition, tracking, and contextual reasoning-based video interpretation method for rapid productivity analysis of construction operations , 2011 .

[27]  Mani Golparvar-Fard,et al.  Visualization of construction progress monitoring with 4D simulation model overlaid on time-lapsed photographs , 2009 .

[28]  Zhaoxiang Zhang,et al.  Scale-Aware Trident Networks for Object Detection , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[29]  Pietro Perona,et al.  One-shot learning of object categories , 2006, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[30]  Wei Liu,et al.  SSD: Single Shot MultiBox Detector , 2015, ECCV.

[31]  Hao Chen,et al.  FCOS: Fully Convolutional One-Stage Object Detection , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[32]  Xiaochun Luo,et al.  A deep learning-based method for detecting non-certified work on construction sites , 2018, Adv. Eng. Informatics.

[33]  Luc Van Gool,et al.  The Pascal Visual Object Classes (VOC) Challenge , 2010, International Journal of Computer Vision.

[34]  Man-Woo Park,et al.  Hardhat-Wearing Detection for Enhancing On-Site Safety of Construction Workers , 2015 .

[35]  S. Takahashi,et al.  Motion tracking of crane hook based on optical flow and orientation code matching , 2008, 2008 10th IEEE International Workshop on Advanced Motion Control.

[36]  Ross B. Girshick,et al.  Mask R-CNN , 2017, 1703.06870.

[37]  Jack Chin Pang Cheng,et al.  Full body pose estimation of construction equipment using computer vision and deep learning techniques , 2020 .

[38]  Timothy Bretl,et al.  Detecting and Classifying Cranes Using Camera-Equipped UAVs for Monitoring Crane-Related Safety Hazards , 2017 .

[39]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[40]  Xiaochun Luo,et al.  Detecting non-hardhat-use by a deep learning method from far-field surveillance videos , 2018 .

[41]  David A. McAllester,et al.  A discriminatively trained, multiscale, deformable part model , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[42]  Jiebo Luo,et al.  DOTA: A Large-Scale Dataset for Object Detection in Aerial Images , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[43]  Jixiu Wu,et al.  Automatic detection of hardhats worn by construction personnel: A deep learning approach and benchmark dataset , 2019, Automation in Construction.