Simultaneous multi-body stereo and segmentation

This paper presents a novel multi-body multi-view stereo method to simultaneously recover dense depth maps and perform segmentation with the input of a monocular image sequence. Unlike traditional multi-view stereo approaches that generally handle a single static scene or an object, we show that depth estimation and segmentation can be jointly modeled and be globally solved in an energy minimization framework for ubiquitous scenes containing multiple independently moving rigid objects. Our major contribution includes a new multi-body stereo model, which integrates the color, geometry, and layer constraints for spatio-temporal depth recovery and automatic object segmentation. A two-pass optimization scheme is proposed to progressively update the estimates. Our method is applied to a variety of challenging examples.

[1]  Hujun Bao,et al.  Robust Bilayer Segmentation and Motion/Depth Estimation with a Handheld Camera , 2011, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[2]  A. Criminisi,et al.  Bilayer Segmentation of Live Video , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[3]  Nebojsa Jojic,et al.  Consistent segmentation for optical flow estimation , 2005, Tenth IEEE International Conference on Computer Vision (ICCV'05) Volume 1.

[4]  Richard Szeliski,et al.  A Comparison and Evaluation of Multi-View Stereo Reconstruction Algorithms , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[5]  Hujun Bao,et al.  Robust Metric Reconstruction from Challenging Video Sequences , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[6]  Cordelia Schmid,et al.  Segmenting, modeling, and matching video clips containing multiple moving objects , 2004, Proceedings of the 2004 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2004. CVPR 2004..

[7]  Andrew W. Fitzgibbon,et al.  Interactive Feature Tracking using K-D Trees and Dynamic Programming , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[8]  Richard Szeliski,et al.  Extracting View-Dependent Depth Maps from a Collection of Images , 2004, International Journal of Computer Vision.

[9]  Edward H. Adelson,et al.  A unified mixture framework for motion segmentation: incorporating spatial coherence and estimating the number of models , 1996, Proceedings CVPR IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[10]  Pedro F. Felzenszwalb,et al.  Efficient belief propagation for early vision , 2004, CVPR 2004.

[11]  W. F. Clocksin,et al.  Joint Optimization for Object Class Segmentation and Dense Stereo Reconstruction , 2011, International Journal of Computer Vision.

[12]  Leonard McMillan,et al.  Post-rendering 3D warping , 1997, SI3D.

[13]  Jan-Michael Frahm,et al.  Real-Time Visibility-Based Fusion of Depth Maps , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[14]  Konrad Schindler,et al.  Perspective n-View Multibody Structure-and-Motion Through Model Selection , 2006, ECCV.

[15]  Ronen Basri,et al.  Hierarchy and adaptivity in segmenting visual scenes , 2006, Nature.

[16]  Andrew Zisserman,et al.  Learning Layered Motion Segmentations of Video , 2005, Tenth IEEE International Conference on Computer Vision (ICCV'05) Volume 1.

[17]  Hujun Bao,et al.  Refilming with Depth-Inferred Videos , 2009, IEEE Transactions on Visualization and Computer Graphics.

[18]  Hujun Bao,et al.  Consistent Depth Maps Recovery from a Video Sequence , 2009, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[19]  Luc Van Gool,et al.  Simultaneous Segmentation and 3D Reconstruction of Monocular Image Sequences , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[20]  K. Lempert,et al.  CONDENSED 1,3,5-TRIAZEPINES - IV THE SYNTHESIS OF 2,3-DIHYDRO-1H-IMIDAZO-[1,2-a] [1,3,5] BENZOTRIAZEPINES , 1983 .

[21]  Béla Ágai,et al.  CONDENSED 1,3,5-TRIAZEPINES - V THE SYNTHESIS OF PYRAZOLO [1,5-a] [1,3,5]-BENZOTRIAZEPINES , 1983 .

[22]  Hai Tao,et al.  A global matching framework for stereo computation , 2001, Proceedings Eighth IEEE International Conference on Computer Vision. ICCV 2001.

[23]  Dorin Comaniciu,et al.  Mean Shift: A Robust Approach Toward Feature Space Analysis , 2002, IEEE Trans. Pattern Anal. Mach. Intell..

[24]  René Vidal,et al.  Motion Segmentation in the Presence of Outlying, Incomplete, or Corrupted Trajectories , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[25]  Slobodan Ilic,et al.  Probabilistic Disparity Fusion for Real-Time Motion-Stereo , 2010 .

[26]  Roberto Tron RenVidal A Benchmark for the Comparison of 3-D Motion Segmentation Algorithms , 2007 .

[27]  T. Kanade,et al.  A multi-body factorization method for motion analysis , 1995, ICCV 1995.

[28]  Harpreet S. Sawhney,et al.  Layered representation of motion video using robust maximum-likelihood estimation of mixture models and MDL encoding , 1995, Proceedings of IEEE International Conference on Computer Vision.

[29]  Ruigang Yang,et al.  Semantic Segmentation of Urban Scenes Using Dense Depth Maps , 2010, ECCV.

[30]  Harry Shum,et al.  Background Cut , 2006, ECCV.

[31]  Jitendra Malik,et al.  Normalized cuts and image segmentation , 1997, Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition.