Space-Time Joint Multi-layer Segmentation and Depth Estimation

Video-based segmentation and reconstruction techniques are predominantly extensions of techniques developed for the image domain treating each frame independently. These approaches ignore the temporal information contained in input videos which can lead to incoherent results. We propose a framework for joint segmentation and reconstruction which explicitly enforces temporal consistency by formulating the problem as an energy minimisation generalised to groups of frames. The main idea is to use optical flow in combination with a confidence measure to impose robust temporal smoothness constraints. Optimisation is performed using recent advances in the field of graph-cuts combined with practical considerations to reduce run-time and memory consumption. Experimental results with real sequences containing rapid motion demonstrate that the method is able to improve spatio-temporal coherence both in terms of segmentation and reconstruction without introducing any degradation in regions where optical flow fails due to fast motion.

[1]  Qionghai Dai,et al.  Multiview video depth estimation with spatial-temporal consistency , 2010, BMVC.

[2]  Maneesh Agrawala,et al.  Interactive video cutout , 2005, SIGGRAPH 2005.

[3]  Gareth Funka-Lea,et al.  Graph Cuts and Efficient N-D Image Segmentation , 2006, International Journal of Computer Vision.

[4]  Marcus A. Magnor,et al.  Weighted Minimal Hypersurface Reconstruction , 2007, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[5]  Marcus A. Magnor,et al.  Joint 3D-reconstruction and background separation in multiple views using graph cuts , 2003, 2003 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2003. Proceedings..

[6]  Jean Ponce,et al.  Accurate, Dense, and Robust Multiview Stereopsis , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[7]  Michael M. Kazhdan,et al.  Poisson surface reconstruction , 2006, SGP '06.

[8]  Jean-Yves Guillemaut,et al.  Robust graph-cut scene segmentation and reconstruction for free-viewpoint video of complex dynamic scenes , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[9]  Szymon Rusinkiewicz,et al.  Spacetime Stereo: A Unifying Framework for Depth from Triangulation , 2005, IEEE Trans. Pattern Anal. Mach. Intell..

[10]  Li Zhang,et al.  Spacetime stereo: shape recovery for dynamic scenes , 2003, 2003 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2003. Proceedings..

[11]  Jean-Yves Guillemaut,et al.  Joint Multi-Layer Segmentation and Reconstruction for Free-Viewpoint Video Applications , 2011, International Journal of Computer Vision.

[12]  Olga Veksler,et al.  Fast Approximate Energy Minimization via Graph Cuts , 2001, IEEE Trans. Pattern Anal. Mach. Intell..

[13]  Thomas Brox,et al.  High Accuracy Optical Flow Estimation Based on a Theory for Warping , 2004, ECCV.

[14]  Harry Shum,et al.  Background Cut , 2006, ECCV.

[15]  Richard Szeliski,et al.  High-quality video view interpolation using a layered representation , 2004, SIGGRAPH 2004.

[16]  Hai Tao,et al.  Dynamic depth recovery from multiple synchronized video streams , 2001, Proceedings of the 2001 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. CVPR 2001.

[17]  M. Gelautz,et al.  Temporally consistent disparity maps from uncalibrated stereo videos , 2009, 2009 Proceedings of 6th International Symposium on Image and Signal Processing and Analysis.

[18]  Jian Sun,et al.  Video object cut and paste , 2005, SIGGRAPH 2005.

[19]  Guillermo Sapiro,et al.  Video SnapCut: robust video object cutout using localized classifiers , 2009, ACM Trans. Graph..

[20]  Hans-Peter Seidel,et al.  Coherent Spatiotemporal Filtering, Upsampling and Rendering of RGBZ Videos , 2012, Comput. Graph. Forum.

[21]  Ruigang Yang,et al.  Spatial-Temporal Fusion for High Accuracy Depth Maps Using Dynamic MRFs , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[22]  Yuri Boykov,et al.  A Scalable graph-cut algorithm for N-D grids , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[23]  Marc Pollefeys,et al.  Temporally Consistent Reconstruction from Multiple Video Streams Using Enhanced Belief Propagation , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[24]  Fang Liu,et al.  Disparity Estimation in Stereo Sequences using Scene Flow , 2009, BMVC.

[25]  Pushmeet Kohli,et al.  Reduce, reuse & recycle: Efficiently solving multi-label MRFs , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[26]  Minglun Gong Enforcing Temporal Consistency in Real-Time Stereo Estimation , 2006, ECCV.

[27]  Markus H. Gross,et al.  3D Video Billboard Clouds , 2007, Comput. Graph. Forum.

[28]  David Salesin,et al.  Video matting of complex scenes , 2002, SIGGRAPH.

[29]  P. J. Narayanan,et al.  Solving Multilabel MRFs Using Incremental alpha-Expansion on the GPUs , 2009, ACCV.

[30]  Takeo Kanade,et al.  Three-dimensional scene flow , 2005, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[31]  Hujun Bao,et al.  Consistent Depth Maps Recovery from a Video Sequence , 2009, IEEE Transactions on Pattern Analysis and Machine Intelligence.