Small-objectness sensitive detection based on shifted single shot detector

We present a small object sensitive method for object detection. Our method is built based on SSD (Single Shot MultiBox Detector (Liu et al. 2016)), a simple but effective deep neural network for image object detection. The discrete nature of anchor mechanism used in SSD, however, may cause misdetection for the small objects located at gaps between the anchor boxes. SSD performs better for small object detection after circular shifts of the input image. Therefore, auxiliary feature maps are generated by conducting circular shifts over lower extra feature maps in SSD for small-object detection, which is equivalent to shifting the objects in order to fit the locations of anchor boxes. We call our proposed system Shifted SSD. Moreover, pinpoint accuracy of localization is of vital importance to small objects detection. Hence, two novel methods called Smooth NMS and IoU-Prediction module are proposed to obtain more precise locations. Then for video sequences, we generate trajectory hypothesis to obtain predicted locations in a new frame for further improved performance. Experiments conducted on PASCAL VOC 2007, along with MS COCO, KITTI and our small object video datasets, validate that both mAP and recall are improved with different degrees and the speed is almost the same as SSD.

[1]  Jing Liu,et al.  Clustering-Guided Sparse Structural Learning for Unsupervised Feature Selection , 2014, IEEE Transactions on Knowledge and Data Engineering.

[2]  Rui Caseiro,et al.  High-Speed Tracking with Kernelized Correlation Filters , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[3]  Yann LeCun,et al.  Signature Verification Using A "Siamese" Time Delay Neural Network , 1993, Int. J. Pattern Recognit. Artif. Intell..

[4]  Shuicheng Yan,et al.  An HOG-LBP human detector with partial occlusion handling , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[5]  Wei Liu,et al.  SSD: Single Shot MultiBox Detector , 2015, ECCV.

[6]  Iasonas Kokkinos,et al.  Semantic Image Segmentation with Deep Convolutional Nets and Fully Connected CRFs , 2014, ICLR.

[7]  Wei Liu,et al.  DSSD : Deconvolutional Single Shot Detector , 2017, ArXiv.

[8]  Ali Farhadi,et al.  YOLO9000: Better, Faster, Stronger , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[9]  Ian D. Reid,et al.  Joint tracking and segmentation of multiple targets , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[10]  Andreas Geiger,et al.  Are we ready for autonomous driving? The KITTI vision benchmark suite , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[11]  Trevor Darrell,et al.  Caffe: Convolutional Architecture for Fast Feature Embedding , 2014, ACM Multimedia.

[12]  Silvio Savarese,et al.  Data-driven 3D Voxel Patterns for object category recognition , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[13]  Kavita Bala,et al.  Inside-Outside Net: Detecting Objects in Context with Skip Pooling and Recurrent Neural Networks , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[14]  Fuchun Sun,et al.  RON: Reverse Connection with Objectness Prior Networks for Object Detection , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[15]  Jianping Fan,et al.  iPrivacy: Image Privacy Protection by Identifying Sensitive Objects via Deep Multi-Task Learning , 2017, IEEE Transactions on Information Forensics and Security.

[16]  Meng Wang,et al.  Image-Based Three-Dimensional Human Pose Recovery by Multiview Locality-Sensitive Sparse Retrieval , 2015, IEEE Transactions on Industrial Electronics.

[17]  Michael S. Bernstein,et al.  ImageNet Large Scale Visual Recognition Challenge , 2014, International Journal of Computer Vision.

[18]  Liang Lin,et al.  Is Faster R-CNN Doing Well for Pedestrian Detection? , 2016, ECCV.

[19]  Vladlen Koltun,et al.  Multi-Scale Context Aggregation by Dilated Convolutions , 2015, ICLR.

[20]  Silvio Savarese,et al.  Subcategory-Aware Convolutional Neural Networks for Object Proposals and Detection , 2016, 2017 IEEE Winter Conference on Applications of Computer Vision (WACV).

[21]  Ross B. Girshick,et al.  Fast R-CNN , 2015, 1504.08083.

[22]  Ali Farhadi,et al.  You Only Look Once: Unified, Real-Time Object Detection , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[23]  Derek Hoiem,et al.  Diagnosing Error in Object Detectors , 2012, ECCV.

[24]  Pietro Perona,et al.  Microsoft COCO: Common Objects in Context , 2014, ECCV.

[25]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[26]  Matthijs C. Dorst Distinctive Image Features from Scale-Invariant Keypoints , 2011 .

[27]  Meng Wang,et al.  Multimodal Deep Autoencoder for Human Pose Recovery , 2015, IEEE Transactions on Image Processing.

[28]  Jian Sun,et al.  Spatial Pyramid Pooling in Deep Convolutional Networks for Visual Recognition , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[29]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[30]  Nojun Kwak,et al.  Analysis on the Dropout Effect in Convolutional Neural Networks , 2016, ACCV.

[31]  Nikos Komodakis,et al.  LocNet: Improving Localization Accuracy for Object Detection , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[32]  Trevor Darrell,et al.  Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation , 2013, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[33]  Jianxiong Xiao,et al.  R-CNN for Small Object Detection , 2016, ACCV.

[34]  Jun Yu,et al.  Multitask Autoencoder Model for Recovering Human Poses , 2018, IEEE Transactions on Industrial Electronics.

[35]  Charless C. Fowlkes,et al.  Globally-optimal greedy algorithms for tracking a variable number of objects , 2011, CVPR 2011.

[36]  Lars Petersson,et al.  DeNet: Scalable Real-Time Object Detection with Directed Sparse Sampling , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[37]  Kaiming He,et al.  Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[38]  Nitish Srivastava,et al.  Dropout: a simple way to prevent neural networks from overfitting , 2014, J. Mach. Learn. Res..

[39]  Luc Van Gool,et al.  The Pascal Visual Object Classes (VOC) Challenge , 2010, International Journal of Computer Vision.

[40]  Jing Liu,et al.  Robust Structured Subspace Learning for Data Representation , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[41]  Jitendra Malik,et al.  Hypercolumns for object segmentation and fine-grained localization , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[42]  Jianguo Zhang,et al.  The PASCAL Visual Object Classes Challenge , 2006 .

[43]  Jinhui Tang,et al.  CAD: Scale Invariant Framework for Real-Time Object Detection , 2017, 2017 IEEE International Conference on Computer Vision Workshops (ICCVW).