Foreground Gating and Background Refining Network for Surveillance Object Detection

Detecting objects in surveillance videos is an important problem due to its wide applications in traffic control and public security. Existing methods tend to face performance degradation because of false positive or misalignment problems. We propose a novel framework, namely, Foreground Gating and Background Refining Network (FG-BR Net), for surveillance object detection (SOD). To reduce false positives in background regions, which is a critical problem in SOD, we introduce a new module that first subtracts the background of a video sequence and then generates high-quality region proposals. Unlike previous background subtraction methods that may wrongly remove the static foreground objects in a frame, a feedback connection from detection results to background subtraction process is proposed in our model to distill both static and moving objects in surveillance videos. Furthermore, we introduce another module, namely, the background refining stage, to refine the detection results with more accurate localizations. Pairwise non-local operations are adopted to cope with the misalignments between the features of original and background frames. Extensive experiments on real-world traffic surveillance benchmarks demonstrate the competitive performance of the proposed FG-BR Net. In particular, FG-BR Net ranks on the top among all the methods on hard and sunny subsets of the UA-DETRAC detection dataset, without any bells and whistles.

[1]  Lucia Maddalena,et al.  Background Subtraction for Moving Object Detection in RGBD Data: A Survey , 2018, J. Imaging.

[2]  Song Zheng,et al.  An Improved Moving Object Detection Algorithm Based on Frame Difference and Edge Detection , 2007, Fourth International Conference on Image and Graphics (ICIG 2007).

[3]  Shaogang Gong,et al.  Learning a Discriminative Null Space for Person Re-identification , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[4]  Jian Sun,et al.  Convolutional feature masking for joint object and stuff segmentation , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[5]  Wotao Yin,et al.  Iteratively reweighted algorithms for compressive sensing , 2008, 2008 IEEE International Conference on Acoustics, Speech and Signal Processing.

[6]  Abhinav Gupta,et al.  Non-local Neural Networks , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[7]  Fabio Galasso,et al.  Geometric proposals for faster R-CNN , 2017, 2017 14th IEEE International Conference on Advanced Video and Signal Based Surveillance (AVSS).

[8]  Hong Wang,et al.  Evolving boxes for fast vehicle detection , 2017, 2017 IEEE International Conference on Multimedia and Expo (ICME).

[9]  Zhongming Jin,et al.  Previewer for Multi-Scale Object Detector , 2018, ACM Multimedia.

[10]  Luca Antiga,et al.  Automatic differentiation in PyTorch , 2017 .

[11]  Kaiming He,et al.  Feature Pyramid Networks for Object Detection , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[12]  Sajid Javed,et al.  Robust Subspace Learning: Robust PCA, Robust Subspace Tracking, and Robust Subspace Recovery , 2017, IEEE Signal Processing Magazine.

[13]  Matei Zaharia,et al.  NoScope: Optimizing Deep CNN-Based Queries over Video Streams at Scale , 2017, Proc. VLDB Endow..

[14]  Bohyung Han,et al.  Learning Multi-domain Convolutional Neural Networks for Visual Tracking , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[15]  Martin Lauer,et al.  UA-DETRAC 2017: Report of AVSS2017 & IWT4S Challenge on Advanced Traffic Monitoring , 2017, 2017 14th IEEE International Conference on Advanced Video and Signal Based Surveillance (AVSS).

[16]  Laura Balzano,et al.  Incremental gradient on the Grassmannian for online foreground and background separation in subsampled video , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[17]  Thierry Bouwmans,et al.  Traditional and recent approaches in background modeling for foreground detection: An overview , 2014, Comput. Sci. Rev..

[18]  Ming-Hsuan Yang,et al.  UA-DETRAC: A new benchmark and protocol for multi-object detection and tracking , 2015, Comput. Vis. Image Underst..

[19]  Andreas Geiger,et al.  Are we ready for autonomous driving? The KITTI vision benchmark suite , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[20]  Xiaogang Wang,et al.  Residual Attention Network for Image Classification , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[21]  Rui Caseiro,et al.  High-Speed Tracking with Kernelized Correlation Filters , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[22]  Soon Ki Jung,et al.  Deep Neural Network Concepts for Background Subtraction: A Systematic Review and Comparative Evaluation , 2018, Neural Networks.

[23]  Trevor Darrell,et al.  Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation , 2013, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[24]  Namrata Vaswani,et al.  A Fast and Memory-Efficient Algorithm for Robust PCA (MEROP) , 2018, 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[25]  Yi Li,et al.  Deformable Convolutional Networks , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[26]  Chang Zhou,et al.  Foreground Gated Network for Surveillance Object Detection , 2018, 2018 IEEE Fourth International Conference on Multimedia Big Data (BigMM).

[27]  Lei Zhang,et al.  Robust Online Matrix Factorization for Dynamic Background Subtraction , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[28]  Brendt Wohlberg,et al.  Incremental Principal Component Pursuit for Video Background Modeling , 2015, Journal of Mathematical Imaging and Vision.

[29]  Nigel J. B. McFarlane,et al.  Segmentation and tracking of piglets in images , 1995, Machine Vision and Applications.

[30]  Guillermo Sapiro,et al.  Non-local sparse models for image restoration , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[31]  Jitendra Malik,et al.  R-CNNs for Pose Estimation and Action Detection , 2014, ArXiv.

[32]  Jitendra Malik,et al.  Learning Rich Features from RGB-D Images for Object Detection and Segmentation , 2014, ECCV.

[33]  Wei Liu,et al.  SSD: Single Shot MultiBox Detector , 2015, ECCV.

[34]  W. Eric L. Grimson,et al.  Adaptive background mixture models for real-time tracking , 1999, Proceedings. 1999 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (Cat. No PR00149).

[35]  Michal Irani,et al.  Super-resolution from a single image , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[36]  Kaiming He,et al.  Detecting and Recognizing Human-Object Interactions , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[37]  Namrata Vaswani,et al.  Practical ReProCS for separating sparse and low-dimensional signal sequences from their sum — Part 1 , 2014, 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[38]  Pietro Perona,et al.  Fast Feature Pyramids for Object Detection , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[39]  Senem Velipasalar,et al.  Light-weight salient foreground detection for embedded smart cameras , 2008, 2008 Second ACM/IEEE International Conference on Distributed Smart Cameras.

[40]  Jianfang Dou,et al.  Background subtraction based on deep convolutional neural networks features , 2018, Multimedia Tools and Applications.

[41]  Qi Tian,et al.  Statistical modeling of complex backgrounds for foreground object detection , 2004, IEEE Transactions on Image Processing.

[42]  D. Scott Wills,et al.  Real-Time Adaptive Background Modeling for Multicore Embedded Systems , 2011, J. Signal Process. Syst..

[43]  Fuchun Sun,et al.  RON: Reverse Connection with Objectness Prior Networks for Object Detection , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[44]  Thierry Bouwmans,et al.  Background Subtraction in Real Applications: Challenges, Current Models and Future Directions , 2019, Comput. Sci. Rev..

[45]  Ross B. Girshick,et al.  Fast R-CNN , 2015, 1504.08083.

[46]  Nuno Vasconcelos,et al.  Learning Complexity-Aware Cascades for Deep Pedestrian Detection , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[47]  David A. McAllester,et al.  Object Detection with Discriminatively Trained Part Based Models , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[48]  Lukasz Kaiser,et al.  Attention is All you Need , 2017, NIPS.

[49]  Ross B. Girshick,et al.  Mask R-CNN , 2017, 1703.06870.

[50]  Namrata Vaswani,et al.  Practical ReProCS for separating sparse and low-dimensional signal sequences from their sum — Part 2 , 2014, 2014 IEEE Global Conference on Signal and Information Processing (GlobalSIP).

[51]  Ali Farhadi,et al.  YOLO9000: Better, Faster, Stronger , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[52]  Senem Velipasalar,et al.  Cooperative Object Tracking and Composite Event Detection With Wireless Embedded Smart Cameras , 2010, IEEE Transactions on Image Processing.

[53]  Senem Velipasalar,et al.  Adaptive Methodologies for Energy-Efficient Object Detection and Tracking With Battery-Powered Embedded Smart Cameras , 2011, IEEE Transactions on Circuits and Systems for Video Technology.

[54]  Jean-Michel Morel,et al.  A non-local algorithm for image denoising , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[55]  Kaiming He,et al.  Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[56]  Paul W. Fieguth,et al.  Embedded Motion Detection via Neural Response Mixture Background Modeling , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[57]  Koen E. A. van de Sande,et al.  Selective Search for Object Recognition , 2013, International Journal of Computer Vision.

[58]  Rogério Schmidt Feris,et al.  A Unified Multi-scale Deep Convolutional Neural Network for Fast Object Detection , 2016, ECCV.

[59]  Vandit Gajjar,et al.  Human Detection for Night Surveillance using Adaptive Background Subtracted Image , 2017, ArXiv.

[60]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[61]  Pietro Perona,et al.  Microsoft COCO: Common Objects in Context , 2014, ECCV.

[62]  Xiaogang Wang,et al.  DeepReID: Deep Filter Pairing Neural Network for Person Re-identification , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[63]  Sajid Javed,et al.  On the Applications of Robust PCA in Image and Video Processing , 2018, Proceedings of the IEEE.