A new spatio-temporal MRF framework for video-based object segmentation

In this paper we propose a general framework for videobased object segmentation using a new spatio-temporal Markov Random Field (MRF) model. Video-based object segmentation has the potential to improve the performance of static image segmentation by fusing information over the temporal scale. Built upon a spatio-temporal MRF model, our method offers three advantages in a unified framework. First, our model is defined on a flexible graph structure induced by the local motion information instead of the regular 3D grid, allowing the model nodes to be connected to more reliable temporal neighbors. Second, the segmentation task is considered in unison with foreground/background modeling, leading to more accurate appearance models suitable for object segmentation. Third, the inclusion of shape priors as a top-down highlevel object constraints as guides for the bottom-up low-level image cues leads to improved segmentation. The object segmentation is solved as an MRF-MAP inference problem by the Loopy Belief Propagation (LBP) algorithm effectively and efficiently. Estimation of the model parameters can be accomplished simultaneously with the inference problem, using an Expectation Maximization (EM) algorithm. In our experiments, we show promising object segmentation results by combining multiple modules in

[1]  Olga Veksler,et al.  Fast approximate energy minimization via graph cuts , 2001, Proceedings of the Seventh IEEE International Conference on Computer Vision.

[2]  Donald Geman,et al.  Stochastic Relaxation, Gibbs Distributions, and the Bayesian Restoration of Images , 1984, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[3]  William T. Freeman,et al.  Comparison of graph cuts with belief propagation for stereo, using identical MRF parameters , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[4]  J. Besag On the Statistical Analysis of Dirty Pictures , 1986 .

[5]  Alice Caplier,et al.  Spatiotemporal MRF approach to video segmentation: Application to motion detection and lip segmentation , 1999, Signal Processing.

[6]  W. Freeman,et al.  Generalized Belief Propagation , 2000, NIPS.

[7]  Andrew Zisserman,et al.  OBJ CUT , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[8]  Shimon Ullman,et al.  Combining Top-Down and Bottom-Up Segmentation , 2004, 2004 Conference on Computer Vision and Pattern Recognition Workshop.

[9]  Vladimir Pavlovic,et al.  Embedded Profile Hidden Markov Models for Shape Analysis , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[10]  Richard Szeliski,et al.  A Comparative Study of Energy Minimization Methods for Markov Random Fields , 2006, ECCV.

[11]  Vladimir Pavlovic,et al.  A graphical model framework for coupling MRFs and deformable models , 2004, Proceedings of the 2004 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2004. CVPR 2004..

[12]  L. Davis,et al.  Background and foreground modeling using nonparametric kernel density estimation for visual surveillance , 2002, Proc. IEEE.

[13]  Katsushi Ikeuchi,et al.  Segmentations of Spatio-Temporal Images by Spatio-Temporal Markov Random Field Model , 2001, EMMCVPR.

[14]  Marc Pollefeys,et al.  Temporally Consistent Reconstruction from Multiple Video Streams Using Enhanced Belief Propagation , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[15]  Robert T. Collins,et al.  Belief Propagation in a 3D Spatio-temporal MRF for Moving Object Detection , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[16]  Tomaso Poggio,et al.  Probabilistic Solution of Ill-Posed Problems in Computational Vision , 1987 .

[17]  Yang Wang,et al.  Spatiotemporal video segmentation based on graphical models , 2005, IEEE Transactions on Image Processing.

[18]  Stuart J. Russell,et al.  Image Segmentation in Video Sequences: A Probabilistic Approach , 1997, UAI.

[19]  Marie-Pierre Jolly,et al.  Interactive graph cuts for optimal boundary & region segmentation of objects in N-D images , 2001, Proceedings Eighth IEEE International Conference on Computer Vision. ICCV 2001.

[20]  Nikos Paragios,et al.  Shape Priors for Level Set Representations , 2002, ECCV.

[21]  Yiannis Aloimonos,et al.  A Roadmap to the Integration of Early Visual Modules , 2007, International Journal of Computer Vision.

[22]  Pushmeet Kohli,et al.  PoseCut: Simultaneous Segmentation and 3D Pose Estimation of Humans Using Dynamic Graph-Cuts , 2006, ECCV.

[23]  Anat Levin,et al.  Learning to Combine Bottom-Up and Top-Down Segmentation , 2006, International Journal of Computer Vision.

[24]  I. Haritaoglu,et al.  Background and foreground modeling using nonparametric kernel density estimation for visual surveillance , 2002 .

[25]  Vladimir Kolmogorov,et al.  What energy functions can be minimized via graph cuts? , 2002, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[26]  William T. Freeman,et al.  Learning Low-Level Vision , 1999, Proceedings of the Seventh IEEE International Conference on Computer Vision.

[27]  W. Eric L. Grimson,et al.  Adaptive background mixture models for real-time tracking , 1999, Proceedings. 1999 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (Cat. No PR00149).

[28]  Renaud Keriven,et al.  Shape Priors using Manifold Learning Techniques , 2007, 2007 IEEE 11th International Conference on Computer Vision.