Robust graph-cut scene segmentation and reconstruction for free-viewpoint video of complex dynamic scenes

Current state-of-the-art image-based scene reconstruction techniques are capable of generating high-fidelity 3D models when used under controlled capture conditions. However, they are often inadequate when used in more challenging outdoor environments with moving cameras. In this case, algorithms must be able to cope with relatively large calibration and segmentation errors as well as input images separated by a wide-baseline and possibly captured at different resolutions. In this paper, we propose a technique which, under these challenging conditions, is able to efficiently compute a high-quality scene representation via graph-cut optimisation of an energy function combining multiple image cues with strong priors. Robustness is achieved by jointly optimising scene segmentation and multiple view reconstruction in a view-dependent manner with respect to each input camera. Joint optimisation prevents propagation of errors from segmentation to reconstruction as is often the case with sequential approaches. View-dependent processing increases tolerance to errors in on-the-fly calibration compared to global approaches. We evaluate our technique in the case of challenging outdoor sports scenes captured with manually operated broadcast cameras and demonstrate its suitability for high-quality free-viewpoint video.

[1]  Yuichi Ohta,et al.  Live 3D Video in Soccer Stadium , 2003, SIGGRAPH '03.

[2]  Cordelia Schmid,et al.  A Comparison of Affine Region Detectors , 2005, International Journal of Computer Vision.

[3]  Roberto Cipolla,et al.  Using Multiple Hypotheses to Improve Depth-Maps for Multi-View Stereo , 2008, ECCV.

[4]  Vladimir Kolmogorov,et al.  An experimental comparison of min-cut/max- flow algorithms for energy minimization in vision , 2001, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[5]  Marcus A. Magnor,et al.  Joint 3D-reconstruction and background separation in multiple views using graph cuts , 2003, 2003 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2003. Proceedings..

[6]  SchmidCordelia,et al.  A Performance Evaluation of Local Descriptors , 2005 .

[7]  Ingemar J. Cox,et al.  A maximum-flow formulation of the N-camera stereo correspondence problem , 1998, Sixth International Conference on Computer Vision (IEEE Cat. No.98CH36271).

[8]  Cordelia Schmid,et al.  A Performance Evaluation of Local Descriptors , 2005, IEEE Trans. Pattern Anal. Mach. Intell..

[9]  R. Zabih,et al.  Exact voxel occupancy with graph cuts , 2000, Proceedings IEEE Conference on Computer Vision and Pattern Recognition. CVPR 2000 (Cat. No.PR00662).

[10]  Michael Goesele,et al.  Multi-View Stereo Revisited , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[11]  Richard Szeliski,et al.  High-quality video view interpolation using a layered representation , 2004, SIGGRAPH 2004.

[12]  Ian D. Reid,et al.  Multiview segmentation and tracking of dynamic occluding layers , 2010, Image Vis. Comput..

[13]  Olga Veksler,et al.  Fast Approximate Energy Minimization via Graph Cuts , 2001, IEEE Trans. Pattern Anal. Mach. Intell..

[14]  Pushmeet Kohli,et al.  Reduce, reuse & recycle: Efficiently solving multi-label MRFs , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[15]  Jean-Yves Guillemaut,et al.  Objective Quality Assessment in Free-Viewpoint Video Production , 2008, 2008 3DTV Conference: The True Vision - Capture, Transmission and Display of 3D Video.

[16]  Harry Shum,et al.  Background Cut , 2006, ECCV.

[17]  Andrew Blake,et al.  Probabilistic Fusion of Stereo with Color and Contrast for Bilayer Segmentation , 2006, IEEE Trans. Pattern Anal. Mach. Intell..

[18]  Takeo Kanade,et al.  Constructing virtual worlds using dense stereo , 1998, Sixth International Conference on Computer Vision (IEEE Cat. No.98CH36271).

[19]  Richard Szeliski,et al.  A Comparative Study of Energy Minimization Methods for Markov Random Fields with Smoothness-Based Priors , 2008, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[20]  Jitendra Malik,et al.  Modeling and Rendering Architecture from Photographs: A hybrid geometry- and image-based approach , 1996, SIGGRAPH.

[21]  Oliver Grau,et al.  A Bayesian Framework for Simultaneous Matting and 3D Reconstruction , 2007, Sixth International Conference on 3-D Digital Imaging and Modeling (3DIM 2007).

[22]  Richard Szeliski,et al.  A Comparison and Evaluation of Multi-View Stereo Reconstruction Algorithms , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[23]  Vladimir Kolmogorov,et al.  Multi-camera Scene Reconstruction via Graph Cuts , 2002, ECCV.

[24]  Oliver Grau,et al.  Dual-Mode Deformable Models for Free-Viewpoint Video of Sports Events , 2007, Sixth International Conference on 3-D Digital Imaging and Modeling (3DIM 2007).

[25]  Kiriakos N. Kutulakos Approximate N-View Stereo , 2000, ECCV.

[26]  Graham A. Thomas Real-time camera tracking using sports pitch markings , 2007, Journal of Real-Time Image Processing.