Efficient Single-Shot Multibox Detector for Construction Site Monitoring

Asset monitoring in construction sites is an intricate, manually intensive task, that can highly benefit from automated solutions engineered using deep neural networks. We use Single-Shot Multibox Detector — SSD, for its fine balance between speed and accuracy, to leverage ubiquitously available images and videos from the surveillance cameras on the construction sites and automate the monitoring tasks, hence enabling project managers to better track the performance and optimize the utilization of each resource. We propose to improve the performance of SSD by clustering the predicted boxes instead of a greedy approach like non-maximum suppression. We do so using Affinity Propagation Clustering — APC to cluster the predicted boxes based on the similarity index computed using the spatial features as well as location of predicted boxes. In our attempts, we have been able to improve the mean average precision of SSD by 3.77% on custom dataset consist of images from construction sites and by 1.67% on PASCAL VOC Challenge.

[1]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[2]  Ross B. Girshick,et al.  Fast R-CNN , 2015, 1504.08083.

[3]  Delbert Dueck,et al.  Clustering by Passing Messages Between Data Points , 2007, Science.

[4]  Sergey Ioffe,et al.  Rethinking the Inception Architecture for Computer Vision , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[5]  Wei Liu,et al.  SSD: Single Shot MultiBox Detector , 2015, ECCV.

[6]  Guigang Zhang,et al.  Deep Learning , 2016, Int. J. Semantic Comput..

[7]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[8]  Luc Van Gool,et al.  The Pascal Visual Object Classes (VOC) Challenge , 2010, International Journal of Computer Vision.

[9]  Nikolaos Doulamis,et al.  Deep Learning for Computer Vision: A Brief Review , 2018, Comput. Intell. Neurosci..

[10]  Georgios K. Ouzounis,et al.  Smart cities of the future , 2012, The European Physical Journal Special Topics.

[11]  Bo Chen,et al.  MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications , 2017, ArXiv.

[12]  Bill Triggs,et al.  Histograms of oriented gradients for human detection , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).