TDFSSD: Top-Down Feature Fusion Single Shot MultiBox Detector

Abstract Object detection across different scales is challenging as the variances of object scales. Thus, a novel detection network, Top-Down Feature Fusion Single Shot MultiBox Detector (TDFSSD), is proposed. The proposed network is based on Single Shot MultiBox Detector (SSD) using VGG-16 as backbone with a novel, simple yet efficient feature fusion module, namely, the Top-Down Feature Fusion Module. The proposed module fuses features from higher-level features, containing semantic information, to lower-level features, containing boundary information, iteratively. Extensive experiments have been conducted on PASCAL VOC2007, PASCAL VOC2012, and MS COCO datasets to demonstrate the efficiency of the proposed method. The proposed TDFSSD network is trained end to end and outperforms the state-of-the-art methods across the three datasets. The TDFSSD network achieves 81.7% and 80.1% mAPs on VOC2007 and 2012 respectively, which outperforms the reported best results of both one-stage and two-stage frameworks. In the meantime, it achieves 33.4% mAP on MS COCO test-dev, especially 17.2% average precision (AP) on small objects. Thus all the results show the efficiency of the proposed method on object detection. Code and model are available at: https://github.com/dongfengxijian/TDFSSD .

[1]  Nitin Khanna,et al.  DCT-domain Deep Convolutional Neural Networks for Multiple JPEG Compression Classification , 2017, Signal Process. Image Commun..

[2]  C. Lawrence Zitnick,et al.  Edge Boxes: Locating Object Proposals from Edges , 2014, ECCV.

[3]  Ian D. Reid,et al.  RefineNet: Multi-path Refinement Networks for High-Resolution Semantic Segmentation , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[4]  Kaiming He,et al.  Feature Pyramid Networks for Object Detection , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[5]  Jitendra Malik,et al.  Beyond Skip Connections: Top-Down Modulation for Object Detection , 2016, ArXiv.

[6]  Luc Van Gool,et al.  The Pascal Visual Object Classes (VOC) Challenge , 2010, International Journal of Computer Vision.

[7]  Thomas Brox,et al.  U-Net: Convolutional Networks for Biomedical Image Segmentation , 2015, MICCAI.

[8]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[9]  Joseph O. Deasy,et al.  Multiple Resolution Residually Connected Feature Streams for Automatic Lung Tumor Segmentation From CT Images , 2019, IEEE Transactions on Medical Imaging.

[10]  Ali Farhadi,et al.  YOLO9000: Better, Faster, Stronger , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[11]  Jian Sun,et al.  Spatial Pyramid Pooling in Deep Convolutional Networks for Visual Recognition , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[12]  Zhao Baojun,et al.  Multi-scale object detection by top-down and bottom-up feature pyramid network , 2019, Journal of Systems Engineering and Electronics.

[13]  Xiangyu Zhang,et al.  Large Kernel Matters — Improve Semantic Segmentation by Global Convolutional Network , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[14]  Kaiming He,et al.  Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[15]  Gang Wang,et al.  Door recognition and deep learning algorithm for visual based robot navigation , 2014, 2014 IEEE International Conference on Robotics and Biomimetics (ROBIO 2014).

[16]  Pietro Perona,et al.  Microsoft COCO: Common Objects in Context , 2014, ECCV.

[17]  Nojun Kwak,et al.  Enhancement of SSD by concatenating feature maps for object detection , 2017, BMVC.

[18]  Xiaogang Wang,et al.  Pyramid Scene Parsing Network , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[19]  Xinlong Lu,et al.  ADN for object detection , 2020, IET Comput. Vis..

[20]  Wei Liu,et al.  SSD: Single Shot MultiBox Detector , 2015, ECCV.

[21]  Luca Antiga,et al.  Automatic differentiation in PyTorch , 2017 .

[22]  Iasonas Kokkinos,et al.  DeepLab: Semantic Image Segmentation with Deep Convolutional Nets, Atrous Convolution, and Fully Connected CRFs , 2016, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[23]  Edward H. Adelson,et al.  PYRAMID METHODS IN IMAGE PROCESSING. , 1984 .

[24]  C.-C. Jay Kuo,et al.  Fast face detection on mobile devices by leveraging global and local facial characteristics , 2019, Signal Process. Image Commun..

[25]  Xiaosong Li,et al.  Tiny YOLO Optimization Oriented Bus Passenger Object Detection , 2020 .

[26]  Jonathan Cheung-Wai Chan,et al.  Coarse-to-fine salient object detection based on deep convolutional neural networks , 2018, Signal Process. Image Commun..

[27]  Koen E. A. van de Sande,et al.  Selective Search for Object Recognition , 2013, International Journal of Computer Vision.

[28]  Xuerui Dai,et al.  HybridNet: A fast vehicle detection system for autonomous driving , 2019, Signal Process. Image Commun..

[29]  Ming Liu,et al.  Colonic Polyp Detection in Endoscopic Videos With Single Shot Detection Based Deep Convolutional Neural Network , 2019, IEEE Access.

[30]  Yu Liu,et al.  Zoom Out-and-In Network with Map Attention Decision for Region Proposal and Object Detection , 2017, International Journal of Computer Vision.

[31]  Fuchun Sun,et al.  RON: Reverse Connection with Objectness Prior Networks for Object Detection , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[32]  Kilian Q. Weinberger,et al.  Densely Connected Convolutional Networks , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[33]  Michael S. Bernstein,et al.  ImageNet Large Scale Visual Recognition Challenge , 2014, International Journal of Computer Vision.

[34]  Ming-Sui Lee,et al.  Online CNN-based multiple object tracking with enhanced model updates and identity association , 2018, Signal Process. Image Commun..

[35]  Stefano Tubaro,et al.  Deep Convolutional Neural Networks for pedestrian detection , 2015, Signal Process. Image Commun..

[36]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[37]  Abhinav Gupta,et al.  Contextual Priming and Feedback for Faster R-CNN , 2016, ECCV.

[38]  Fuqiang Zhou,et al.  FSSD: Feature Fusion Single Shot Multibox Detector , 2017, ArXiv.

[39]  Zhiqiang Shen,et al.  DSOD: Learning Deeply Supervised Object Detectors from Scratch , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[40]  Yi Li,et al.  R-FCN: Object Detection via Region-based Fully Convolutional Networks , 2016, NIPS.

[41]  David A. Forsyth,et al.  30Hz Object Detection with DPM V5 , 2014, ECCV.

[42]  Iasonas Kokkinos,et al.  Semantic Image Segmentation with Deep Convolutional Nets and Fully Connected CRFs , 2014, ICLR.

[43]  Wei Liu,et al.  DSSD : Deconvolutional Single Shot Detector , 2017, ArXiv.

[44]  Huicheng Zheng,et al.  Feature Enhancement for Multi-scale Object Detection , 2020, Neural Processing Letters.