Saliency Guided 2D-Object Annotation for Instrumented Vehicles

Instrumented vehicles can produce huge volumes of video data per vehicle per day that must be analysed automatically, often in real time. This analysis should include identifying the presence of objects and tagging these as semantic concepts such as car, pedestrian, etc. An important element in achieving this is the annotation of training data for machine learning algorithms, which requires accurate labels at a high-level of granularity. Current practise is to use trained human annotators who can annotate only a limited volume of video per day. In this paper, we demonstrate how a generic human saliency classifier can provide visual cues for object detection using deep learning approaches. Our work is applied to datasets for autonomous driving. Our experiments show that utilizing visual saliency improves the detection of small objects and increases the overall accuracy compared with a standalone single shot multibox detector.

[1]  Shi-Min Hu,et al.  Global contrast based salient region detection , 2011, CVPR 2011.

[2]  Bing Li,et al.  Salient Object Detection via Structured Matrix Decomposition. , 2017, IEEE transactions on pattern analysis and machine intelligence.

[3]  Huchuan Lu,et al.  Saliency detection via Cellular Automata , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[4]  Tie Liu,et al.  DeepVS: A Deep Learning Based Video Saliency Prediction Approach , 2018, ECCV.

[5]  Xiang Zhang,et al.  OverFeat: Integrated Recognition, Localization and Detection using Convolutional Networks , 2013, ICLR.

[6]  Shengen Yan,et al.  Deep Image: Scaling up Image Recognition , 2015, ArXiv.

[7]  Ali Farhadi,et al.  YOLO9000: Better, Faster, Stronger , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[8]  Qi Zhao,et al.  SALICON: Saliency in Context , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[9]  Wei Liu,et al.  SSD: Single Shot MultiBox Detector , 2015, ECCV.

[10]  Andreas Geiger,et al.  Vision meets robotics: The KITTI dataset , 2013, Int. J. Robotics Res..

[11]  Fei-Fei Li,et al.  Best of both worlds: Human-machine collaboration for object annotation , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[12]  Huchuan Lu,et al.  Saliency Detection via Dense and Sparse Reconstruction , 2013, 2013 IEEE International Conference on Computer Vision.

[13]  Wenguan Wang,et al.  Deep Cropping via Attention Box Prediction and Aesthetics Assessment , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[14]  N. Otsu A threshold selection method from gray level histograms , 1979 .

[15]  Nanning Zheng,et al.  Learning to Detect a Salient Object , 2011, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[16]  Xiao Ke,et al.  End-to-End Automatic Image Annotation Based on Deep CNN and Multi-Label Data Augmentation , 2019, IEEE Transactions on Multimedia.

[17]  Huchuan Lu,et al.  Learning Uncertain Convolutional Features for Accurate Saliency Detection , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[18]  Ross B. Girshick,et al.  Fast R-CNN , 2015, 1504.08083.

[19]  Kaiming He,et al.  Mask R-CNN , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[20]  David Dagan Feng,et al.  Robust saliency detection via regularized random walks ranking , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[21]  Jian Sun,et al.  Saliency Optimization from Robust Background Detection , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[22]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[23]  Xiangyu Zhang,et al.  Light-Head R-CNN: In Defense of Two-Stage Object Detector , 2017, ArXiv.

[24]  Alan F. Smeaton,et al.  Image Aesthetics and Content in Selecting Memorable Keyframes from Lifelogs , 2018, MMM.

[25]  Qing Xie,et al.  CNN-feature based automatic image annotation method , 2018, Multimedia Tools and Applications.

[26]  Dumitru Erhan,et al.  Deep Neural Networks for Object Detection , 2013, NIPS.

[27]  Pietro Perona,et al.  Microsoft COCO: Common Objects in Context , 2014, ECCV.

[28]  David A. Forsyth,et al.  Utility data annotation with Amazon Mechanical Turk , 2008, 2008 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops.

[29]  Michael S. Bernstein,et al.  Scalable multi-label annotation , 2014, CHI.

[30]  Jian Sun,et al.  Spatial Pyramid Pooling in Deep Convolutional Networks for Visual Recognition , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[31]  Trevor Darrell,et al.  Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation , 2013, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[32]  Dewen Hu,et al.  Salient Region Detection via Integrating Diffusion-Based Compactness and Local Contrast , 2015, IEEE Transactions on Image Processing.

[33]  Gayoung Lee,et al.  Deep Saliency with Encoded Low Level Distance Map and High Level Features , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[34]  Yi Li,et al.  R-FCN: Object Detection via Region-based Fully Convolutional Networks , 2016, NIPS.

[35]  Sanja Fidler,et al.  Efficient Interactive Annotation of Segmentation Datasets with Polygon-RNN++ , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[36]  Kaiming He,et al.  Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[37]  Noel E. O'Connor,et al.  SalGAN: Visual Saliency Prediction with Generative Adversarial Networks , 2017, ArXiv.

[38]  Feng Wu,et al.  Background-Driven Salient Object Detection , 2017, IEEE Transactions on Multimedia.