Mask-guided SSD for small-object detection

Detecting small objects is a challenging job for the single-shot multibox detector (SSD) model due to the limited information contained in features and complex background interference. Here, we increased the performance of the SSD for detecting target objects with small size by enhancing detection features with contextual information and introducing a segmentation mask to eliminate background regions. The proposed model is referred to as a “guided SSD” (Mask-SSD) and includes two branches: a detection branch and a segmentation branch. We created a feature-fusion module to allow the detection branch to exploit contextual information for feature maps with large resolution, with the segmentation branch primarily built with atrous convolution to provide additional contextual information to the detection branch. The input of the segmentation branch was also the output of the detection branch, and output segmentation features were fused with detection features in order to classify and locate target objects. Additionally, segmentation features were applied to generate the mask, which was utilized to guide the detection branch to find objects in potential foreground regions. Evaluation of Mask-SSD on the Tsinghua-Tencent 100K and Caltech pedestrian datasets demonstrated its effectiveness at detecting small objects and comparable performance relative to other state-of-the-art methods.

[1]  Bo Wang,et al.  Single-Shot Object Detection with Enriched Semantics , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[2]  Shuicheng Yan,et al.  Scale-Aware Fast R-CNN for Pedestrian Detection , 2015, IEEE Transactions on Multimedia.

[3]  Jun Liu,et al.  Focused random walk with probability distribution for SAT with long clauses , 2020, Applied Intelligence.

[4]  Kyunghyun Cho,et al.  Augmentation for small object detection , 2019, 9th International Conference on Advances in Computing and Information Technology (ACITY 2019).

[5]  Lei Hu,et al.  Small Object Detection with Multiscale Features , 2018, Int. J. Digital Multimedia Broadcasting.

[6]  Kaiming He,et al.  Focal Loss for Dense Object Detection , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[7]  Fei Wang,et al.  Siamese Attentional Keypoint Network for High Performance Visual Tracking , 2019, Knowl. Based Syst..

[8]  Jiashi Feng,et al.  Few-Shot Adaptive Faster R-CNN , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[9]  Lisha Cui,et al.  MDSSD: multi-scale deconvolutional single shot detector for small objects , 2018, Science China Information Sciences.

[10]  Lienhard Pfeifer Shearlet Features for Pedestrian Detection , 2018, Journal of Mathematical Imaging and Vision.

[11]  Forrest N. Iandola,et al.  SqueezeDet: Unified, Small, Low Power Fully Convolutional Neural Networks for Real-Time Object Detection for Autonomous Driving , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[12]  Pietro Perona,et al.  Pedestrian Detection: An Evaluation of the State of the Art , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[13]  Yongqiang Zhang,et al.  SOD-MTGAN: Small Object Detection via Multi-Task Generative Adversarial Network , 2018, ECCV.

[14]  Bin Zhou,et al.  A regional adaptive variational PDE model for computed tomography image reconstruction , 2019, Pattern Recognit..

[15]  Baoli Li,et al.  Traffic-Sign Detection and Classification in the Wild , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[16]  Ali Farhadi,et al.  You Only Look Once: Unified, Real-Time Object Detection , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[17]  Xiaogang Wang,et al.  Pedestrian detection aided by deep learning semantic tasks , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[18]  Gang Sun,et al.  Squeeze-and-Excitation Networks , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[19]  Dong Liu,et al.  Cascade Mask Generation Framework for Fast Small Object Detection , 2018, 2018 IEEE International Conference on Multimedia and Expo (ICME).

[20]  Kaiming He,et al.  Mask R-CNN , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[21]  Ehsan Adeli,et al.  End-to-End Parkinson Disease Diagnosis using Brain MR-Images by 3D-CNN , 2018, ArXiv.

[22]  Trevor Darrell,et al.  Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation , 2013, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[23]  Yi Li,et al.  R-FCN: Object Detection via Region-based Fully Convolutional Networks , 2016, NIPS.

[24]  Li Cheng,et al.  Too Far to See? Not Really!—Pedestrian Detection With Scale-Aware Localization Policy , 2017, IEEE Transactions on Image Processing.

[25]  Forrest N. Iandola,et al.  Shallow Networks for High-accuracy Road Object-detection , 2016, VEHITS.

[26]  Peiyun Hu,et al.  Finding Tiny Faces , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[27]  B. Schiele,et al.  How Far are We from Solving Pedestrian Detection? , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[28]  Pietro Perona,et al.  Microsoft COCO: Common Objects in Context , 2014, ECCV.

[29]  Wei Liu,et al.  DSSD : Deconvolutional Single Shot Detector , 2017, ArXiv.

[30]  Xiangyu Zhang,et al.  Light-Head R-CNN: In Defense of Two-Stage Object Detector , 2017, ArXiv.

[31]  Peng Gao,et al.  Learning Reinforced Attentional Representation for End-to-End Visual Tracking , 2019, Inf. Sci..

[32]  Xiaoming Liu,et al.  Pedestrian Detection With Autoregressive Network Phases , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[33]  Jing Zhang,et al.  Small Object Detection in Unmanned Aerial Vehicle Images Using Feature Fusion and Scaling-Based Single Shot Detector With Spatial Context Analysis , 2020, IEEE Transactions on Circuits and Systems for Video Technology.

[34]  Wei Wei,et al.  Gradient-driven parking navigation using a continuous information potential field based on wireless sensor network , 2017, Inf. Sci..

[35]  Wei Liu,et al.  SSD: Single Shot MultiBox Detector , 2015, ECCV.

[36]  Shuzhi Sam Ge,et al.  Small traffic sign detection from large image , 2019, Applied Intelligence.

[37]  Robertas Damasevicius,et al.  Multi-sink distributed power control algorithm for Cyber-physical-systems in coal mine tunnels , 2019, Comput. Networks.

[38]  Luc Van Gool,et al.  The Pascal Visual Object Classes (VOC) Challenge , 2010, International Journal of Computer Vision.

[39]  Zhenchun Wei,et al.  A Q-learning algorithm for task scheduling based on improved SVM in wireless sensor networks , 2019, Comput. Networks.

[40]  Jinjun Xiong,et al.  Decoupled Classification Refinement: Hard False Positive Suppression for Object Detection , 2018, ArXiv.

[41]  Kaiming He,et al.  Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[42]  Joonki Paik,et al.  Moving object detection using unstable camera for video surveillance systems , 2015 .

[43]  Hyunchul Shin,et al.  Context-aware pedestrian detection especially for small-sized instances with Deconvolution Integrated Faster RCNN (DIF R-CNN) , 2018, Applied Intelligence.

[44]  Jian Yang,et al.  Occluded Pedestrian Detection Through Guided Attention in CNNs , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[45]  Guangming Shi,et al.  Feature-fused SSD: fast detection for small objects , 2017, International Conference on Graphic and Image Processing.

[46]  Yu Zhang,et al.  Real-time small traffic sign detection with revised faster-RCNN , 2018, Multimedia Tools and Applications.

[47]  Ross B. Girshick,et al.  Fast R-CNN , 2015, 1504.08083.

[48]  Alfred O. Hero,et al.  Part-Level Convolutional Neural Networks for Pedestrian Detection Using Saliency and Boundary Box Alignment , 2018, IEEE Access.

[49]  Iasonas Kokkinos,et al.  DeepLab: Semantic Image Segmentation with Deep Convolutional Nets, Atrous Convolution, and Fully Connected CRFs , 2016, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[50]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[51]  Rogério Schmidt Feris,et al.  A Unified Multi-scale Deep Convolutional Neural Network for Fast Object Detection , 2016, ECCV.

[52]  Jianru Xue,et al.  Boosting CNN-Based Pedestrian Detection via 3D LiDAR Fusion in Autonomous Driving , 2017, ICIG.

[53]  Bernt Schiele,et al.  CityPersons: A Diverse Dataset for Pedestrian Detection , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[54]  Yong Zhao,et al.  Extend the shallow part of single shot multibox detector via convolutional neural network , 2018, International Conference on Digital Image Processing.

[55]  Yunchao Wei,et al.  Perceptual Generative Adversarial Networks for Small Object Detection , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).