Dynamic Background Learning through Deep Auto-encoder Networks

Background learning is a pre-processing of motion detection which is a basis step of video analysis. For the static background, many previous works have already achieved good performance. However, the results on learning dynamic background are still much to be improved. To address this challenge, in this paper, a novel and practical method is proposed based on deep auto-encoder networks. Firstly, dynamic background images are extracted through a deep auto-encoder network (called Background Extraction Network) from video frames containing motion objects. Then, a dynamic background model is learned by another deep auto-encoder network (called Background Learning Network) using the extracted background images as the input. To be more flexible, our background model can be updated on-line to absorb more training samples. Our main contributions are 1) a cascade of two deep auto-encoder networks which can deal with the separation of dynamic background and foregrounds very efficiently; 2) a method of online learning is adopted to accelerate the training of Background Extraction Network. Compared with previous algorithms, our approach obtains the best performance over six benchmark data sets. Especially, the experiments show that our algorithm can handle large variation background very well.

[1]  Jingdong Wang,et al.  A Probabilistic Approach to Robust Matrix Factorization , 2012, ECCV.

[2]  G. Sapiro,et al.  A collaborative framework for 3D alignment and classification of heterogeneous subvolumes in cryo-electron tomography. , 2013, Journal of structural biology.

[3]  LiLiyuan,et al.  Statistical modeling of complex backgrounds for foreground object detection , 2004 .

[4]  Yaser Sheikh,et al.  Bayesian object detection in dynamic scenes , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[5]  Minglun Gong,et al.  Realtime background subtraction from dynamic scenes , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[6]  Xiaowei Zhou,et al.  Moving Object Detection by Detecting Contiguous Outliers in the Low-Rank Representation , 2011, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[7]  Kentaro Toyama,et al.  Wallflower: principles and practice of background maintenance , 1999, Proceedings of the Seventh IEEE International Conference on Computer Vision.

[8]  Liang-Tien Chia,et al.  Background subtraction via coherent trajectory decomposition , 2013, MM '13.

[9]  Hossein Mobahi,et al.  Toward a Practical Face Recognition System: Robust Alignment and Illumination by Sparse Representation , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[10]  Rong Jin,et al.  Double Updating Online Learning , 2011, J. Mach. Learn. Res..

[11]  W. Eric L. Grimson,et al.  Adaptive background mixture models for real-time tracking , 1999, Proceedings. 1999 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (Cat. No PR00149).

[12]  Cewu Lu,et al.  Online Robust Dictionary Learning , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[13]  Geoffrey E. Hinton,et al.  Factored conditional restricted Boltzmann Machines for modeling motion style , 2009, ICML '09.

[14]  Geoffrey E. Hinton,et al.  Modeling Human Motion Using Binary Latent Variables , 2006, NIPS.

[15]  Daniel Cremers,et al.  Motion Competition: A Variational Approach to Piecewise Parametric Motion Segmentation , 2005, International Journal of Computer Vision.

[16]  Yi Ma,et al.  The Augmented Lagrange Multiplier Method for Exact Recovery of Corrupted Low-Rank Matrices , 2010, Journal of structural biology.

[17]  Yoshua Bengio,et al.  Extracting and composing robust features with denoising autoencoders , 2008, ICML '08.

[18]  Qiong Wu,et al.  Bi-layer video segmentation with foreground and background infrared illumination , 2008, ACM Multimedia.

[19]  Changick Kim,et al.  Background subtraction using hybrid feature coding in the bag-of-features framework , 2013, Pattern Recognit. Lett..

[20]  Dar-Shyang Lee,et al.  Effective Gaussian mixture learning for video background subtraction , 2005, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[21]  Dale Schuurmans,et al.  implicit Online Learning with Kernels , 2006, NIPS.

[22]  Honglak Lee,et al.  Online Incremental Feature Learning with Denoising Autoencoders , 2012, AISTATS.

[23]  Carlo Tomasi,et al.  Detecting motion synchrony by video tubes , 2011, ACM Multimedia.

[24]  P. Cochat,et al.  Et al , 2008, Archives de pediatrie : organe officiel de la Societe francaise de pediatrie.

[25]  Camille Couprie,et al.  Learning Hierarchical Features for Scene Labeling , 2013, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[26]  Stan Sclaroff,et al.  Segmenting foreground objects from a dynamic textured background via a robust Kalman filter , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[27]  René Vidal,et al.  A Unified Algebraic Approach to 2-D and 3-D Motion Segmentation , 2004, ECCV.

[28]  Nikos Paragios,et al.  Background modeling and subtraction of dynamic scenes , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[29]  Guillermo Sapiro,et al.  Online Learning for Matrix Factorization and Sparse Coding , 2009, J. Mach. Learn. Res..

[30]  Wonjun Kim,et al.  Background Subtraction for Dynamic Texture Scenes Using Fuzzy Color Histograms , 2012, IEEE Signal Processing Letters.

[31]  Yoshua Bengio,et al.  Greedy Layer-Wise Training of Deep Networks , 2006, NIPS.

[32]  F ROSENBLATT,et al.  The perceptron: a probabilistic model for information storage and organization in the brain. , 1958, Psychological review.

[33]  Andrew Blake,et al.  A Probabilistic Background Model for Tracking , 2000, ECCV.

[34]  Nicolas Le Roux,et al.  Weakly Supervised Learning of Foreground-Background Segmentation Using Masked RBMs , 2011, ICANN.

[35]  Steven C. H. Hoi,et al.  Exact Soft Confidence-Weighted Learning , 2012, ICML.

[36]  Mao Ye,et al.  Motion detection via a couple of auto-encoder networks , 2014, 2014 IEEE International Conference on Multimedia and Expo (ICME).

[37]  Qi Tian,et al.  Statistical modeling of complex backgrounds for foreground object detection , 2004, IEEE Transactions on Image Processing.

[38]  Daniel W. C. Ho,et al.  A new training and pruning algorithm based on node dependence and Jacobian rank deficiency , 2006, Neurocomputing.

[39]  Junzhou Huang,et al.  Learning with dynamic group sparsity , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[40]  Marc'Aurelio Ranzato,et al.  Unsupervised Learning of Invariant Feature Hierarchies with Applications to Object Recognition , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[41]  Chunyan Miao,et al.  Online multimodal deep similarity learning with application to image retrieval , 2013, ACM Multimedia.

[42]  Laura Balzano,et al.  Incremental gradient on the Grassmannian for online foreground and background separation in subsampled video , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[43]  Alex Pentland,et al.  Pfinder: Real-Time Tracking of the Human Body , 1997, IEEE Trans. Pattern Anal. Mach. Intell..

[44]  Volkan Cevher,et al.  Compressive Sensing for Background Subtraction , 2008, ECCV.

[45]  Xiaogang Wang,et al.  Background Subtraction via Robust Dictionary Learning , 2011, EURASIP J. Image Video Process..