Semi-supervised Deep Neural Networks for Object Detection in Video Surveillance Systems

Moving object detection under video surveillance systems is a critical task for many computer vision applications. That being said, extracting object from real-world surveillance video is still a challenging task, since various appearances and shapes, and light condition are all unsolved problems. Exploiting contexts from adjacent frames is believed to be valuable for tackling these challenges in surveillance video and is far from development. In this paper, we introduce a novel one-stage approach, named SDNN, to detect objects with multiple successive frames in videos. Specifically, the network fuses the context information of multiple frames, and combines predictions from multi-scale feature maps at different layers. The multi-frame feature fusion scheme enable the training process follows an end-to-end fashion. Experimental results conducted on surveillance video dataset show that the proposed SDNN achieved state-of-the-art results. The source code is available at https://github.com/jmuyjl/SDNN.

[1]  Kaiming He,et al.  Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[2]  Antonio Torralba,et al.  Context-based vision system for place and object recognition , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[3]  Trevor Darrell,et al.  Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation , 2013, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[4]  Xuelong Li,et al.  Efficient HOG human detection , 2011, Signal Process..

[5]  Kaiming He,et al.  Focal Loss for Dense Object Detection , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[6]  Kaiming He,et al.  Feature Pyramid Networks for Object Detection , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[7]  Zhuowen Tu,et al.  Auto-context and its application to high-level vision tasks , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[8]  Luc Van Gool,et al.  The Pascal Visual Object Classes Challenge: A Retrospective , 2014, International Journal of Computer Vision.

[9]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[10]  Ross B. Girshick,et al.  Fast R-CNN , 2015, 1504.08083.

[11]  Oscar Déniz-Suárez,et al.  A comparison of face and facial feature detectors based on the Viola–Jones general object detection framework , 2011, Machine Vision and Applications.

[12]  Hei Law,et al.  CornerNet: Detecting Objects as Paired Keypoints , 2018, International Journal of Computer Vision.

[13]  Xiaodong Wang,et al.  DFF, a Heterodimeric Protein That Functions Downstream of Caspase-3 to Trigger DNA Fragmentation during Apoptosis , 1997, Cell.

[14]  George Papandreou,et al.  Weakly-and Semi-Supervised Learning of a Deep Convolutional Network for Semantic Image Segmentation , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[15]  Wei Liu,et al.  SSD: Single Shot MultiBox Detector , 2015, ECCV.

[16]  Koen E. A. van de Sande,et al.  Selective Search for Object Recognition , 2013, International Journal of Computer Vision.

[17]  David A. McAllester,et al.  Object Detection with Discriminatively Trained Part Based Models , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[18]  Pietro Perona,et al.  Microsoft COCO: Common Objects in Context , 2014, ECCV.

[19]  Li Fei-Fei,et al.  ImageNet: A large-scale hierarchical image database , 2009, CVPR.

[20]  Sanja Fidler,et al.  The Role of Context for Object Detection and Semantic Segmentation in the Wild , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.