Joint background reconstruction and foreground segmentation via a two-stage convolutional neural network

Foreground segmentation in video sequences is a classic topic in computer vision. Due to the lack of semantic and prior knowledge, it is difficult for existing methods to deal with sophisticated scenes well. Therefore, in this paper, we propose an end-to-end two-stage deep convolutional neural network (CNN) framework for foreground segmentation in video sequences. In the first stage, a convolutional encoder-decoder sub-network is employed to reconstruct the background images and encode rich prior knowledge of background scenes. In the second stage, the reconstructed background and current frame are input into a multi-channel fully-convolutional sub-network (MCFCN) for accurate foreground segmentation. In the two-stage CNN, the reconstruction loss and segmentation loss are jointly optimized. The background images and foreground objects are output simultaneously in an end-to-end way. Moreover, by incorporating the prior semantic knowledge of foreground and background in the pre-training process, our method could restrain the background noise and keep the integrity of foreground objects at the same time. Experiments on CDNet 2014 show that our method outperforms the state-of-the-art by 4.9%.

[1]  Thomas Brox,et al.  FlowNet: Learning Optical Flow with Convolutional Networks , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[2]  Guillaume-Alexandre Bilodeau,et al.  A Self-Adjusting Approach to Change Detection Based on Background Word Consensus , 2015, 2015 IEEE Winter Conference on Applications of Computer Vision.

[3]  Vladlen Koltun,et al.  Efficient Inference in Fully Connected CRFs with Gaussian Edge Potentials , 2011, NIPS.

[4]  Yi Ma,et al.  Robust principal component analysis? , 2009, JACM.

[5]  Wilfried Philips,et al.  C-EFIC: Color and Edge Based Foreground Background Segmentation with Interior Classification , 2015, VISIGRAPP.

[6]  Hanqing Lu,et al.  Learning sharable models for robust background subtraction , 2015, 2015 IEEE International Conference on Multimedia and Expo (ICME).

[7]  Mohammed Bennamoun,et al.  Weakly Supervised Change Detection in a Pair of Images , 2016, ArXiv.

[8]  Larry S. Davis,et al.  Non-parametric Model for Background Subtraction , 2000, ECCV.

[9]  Marc Van Droogenbroeck,et al.  ViBe: A Universal Background Subtraction Algorithm for Video Sequences , 2011, IEEE Transactions on Image Processing.

[10]  Rui Wang,et al.  Static and Moving Object Detection Using Flux Tensor with Split Gaussian Models , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition Workshops.

[11]  Iasonas Kokkinos,et al.  DeepLab: Semantic Image Segmentation with Deep Convolutional Nets, Atrous Convolution, and Fully Connected CRFs , 2016, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[12]  Geoffrey E. Hinton,et al.  Reducing the Dimensionality of Data with Neural Networks , 2006, Science.

[13]  Fatih Murat Porikli,et al.  CDnet 2014: An Expanded Change Detection Benchmark Dataset , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition Workshops.

[14]  Xiangjian He,et al.  A unified model sharing framework for moving object detection , 2016, Signal Process..

[15]  Iasonas Kokkinos,et al.  Semantic Image Segmentation with Deep Convolutional Nets and Fully Connected CRFs , 2014, ICLR.

[16]  Alex Pentland,et al.  A Bayesian Computer Vision System for Modeling Human Interaction , 1999, ICVS.

[17]  W. Eric L. Grimson,et al.  Adaptive background mixture models for real-time tracking , 1999, Proceedings. 1999 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (Cat. No PR00149).

[18]  Brendt Wohlberg,et al.  Incremental Principal Component Pursuit for Video Background Modeling , 2015, Journal of Mathematical Imaging and Vision.

[19]  Guillaume-Alexandre Bilodeau,et al.  SuBSENSE: A Universal Change Detection Method With Local Adaptive Sensitivity , 2015, IEEE Transactions on Image Processing.

[20]  Yann LeCun,et al.  Learning a similarity metric discriminatively, with application to face verification , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[21]  Alex Pentland,et al.  A Bayesian Computer Vision System for Modeling Human Interactions , 1999, IEEE Trans. Pattern Anal. Mach. Intell..

[22]  Soumith Chintala,et al.  Unsupervised Representation Learning with Deep Convolutional Generative Adversarial Networks , 2015, ICLR.

[23]  Ross B. Girshick,et al.  Fast R-CNN , 2015, 1504.08083.

[24]  Luc Van Gool,et al.  The Pascal Visual Object Classes Challenge: A Retrospective , 2014, International Journal of Computer Vision.

[25]  Trevor Darrell,et al.  Caffe: Convolutional Architecture for Fast Feature Embedding , 2014, ACM Multimedia.

[26]  Alexei A. Efros,et al.  Context Encoders: Feature Learning by Inpainting , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[27]  Yoshua Bengio,et al.  Extracting and composing robust features with denoising autoencoders , 2008, ICML '08.