Joint Multi-Layer Segmentation and Reconstruction for Free-Viewpoint Video Applications

Current state-of-the-art image-based scene reconstruction techniques are capable of generating high-fidelity 3D models when used under controlled capture conditions. However, they are often inadequate when used in more challenging environments such as sports scenes with moving cameras. Algorithms must be able to cope with relatively large calibration and segmentation errors as well as input images separated by a wide-baseline and possibly captured at different resolutions. In this paper, we propose a technique which, under these challenging conditions, is able to efficiently compute a high-quality scene representation via graph-cut optimisation of an energy function combining multiple image cues with strong priors. Robustness is achieved by jointly optimising scene segmentation and multiple view reconstruction in a view-dependent manner with respect to each input camera. Joint optimisation prevents propagation of errors from segmentation to reconstruction as is often the case with sequential approaches. View-dependent processing increases tolerance to errors in through-the-lens calibration compared to global approaches. We evaluate our technique in the case of challenging outdoor sports scenes captured with manually operated broadcast cameras as well as several indoor scenes with natural background. A comprehensive experimental evaluation including qualitative and quantitative results demonstrates the accuracy of the technique for high quality segmentation and reconstruction and its suitability for free-viewpoint video under these difficult conditions.

[1]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[2]  Lance Williams,et al.  View Interpolation for Image Synthesis , 1993, SIGGRAPH.

[3]  A. Laurentini,et al.  The Visual Hull Concept for Silhouette-Based Image Understanding , 1994, IEEE Trans. Pattern Anal. Mach. Intell..

[4]  Dinesh Manocha,et al.  I-COLLIDE: an interactive and exact collision detection system for large-scale environments , 1995, I3D '95.

[5]  Steven M. Seitz,et al.  View morphing , 1996, SIGGRAPH.

[6]  Richard Szeliski,et al.  The lumigraph , 1996, SIGGRAPH.

[7]  Jitendra Malik,et al.  Modeling and Rendering Architecture from Photographs: A hybrid geometry- and image-based approach , 1996, SIGGRAPH.

[8]  Marc Levoy,et al.  Light field rendering , 1996, SIGGRAPH.

[9]  Saied Moezzi,et al.  Virtual View Generation for 3D Digital Video , 1997, IEEE Multim..

[10]  Ingemar J. Cox,et al.  A maximum-flow formulation of the N-camera stereo correspondence problem , 1998, Sixth International Conference on Computer Vision (IEEE Cat. No.98CH36271).

[11]  Takeo Kanade,et al.  Constructing virtual worlds using dense stereo , 1998, Sixth International Conference on Computer Vision (IEEE Cat. No.98CH36271).

[12]  Olga Veksler,et al.  Fast approximate energy minimization via graph cuts , 2001, Proceedings of the Seventh IEEE International Conference on Computer Vision.

[13]  Paul A. Viola,et al.  Roxels: responsibility weighted 3D volume reconstruction , 1999, Proceedings of the Seventh IEEE International Conference on Computer Vision.

[14]  Kiriakos N. Kutulakos Approximate N-View Stereo , 2000, ECCV.

[15]  Ramesh Raskar,et al.  Image-based visual hulls , 2000, SIGGRAPH.

[16]  Zhengyou Zhang,et al.  A Flexible New Technique for Camera Calibration , 2000, IEEE Trans. Pattern Anal. Mach. Intell..

[17]  R. Zabih,et al.  Exact voxel occupancy with graph cuts , 2000, Proceedings IEEE Conference on Computer Vision and Pattern Recognition. CVPR 2000 (Cat. No.PR00662).

[18]  R. Cipolla,et al.  A probabilistic framework for space carving , 2001, Proceedings Eighth IEEE International Conference on Computer Vision. ICCV 2001.

[19]  Thomas Malzbender,et al.  A Survey of Methods for Volumetric Scene Reconstruction from Photographs , 2001, VG.

[20]  Paul A. Beardsley,et al.  Image-based 3D photography using opacity hulls , 2002, ACM Trans. Graph..

[21]  Vladimir Kolmogorov,et al.  Multi-camera Scene Reconstruction via Graph Cuts , 2002, ECCV.

[22]  David Salesin,et al.  Video matting of complex scenes , 2002, SIGGRAPH.

[23]  Francis Schmitt,et al.  Silhouette and stereo fusion for 3D object modeling , 2003, Fourth International Conference on 3-D Digital Imaging and Modeling, 2003. 3DIM 2003. Proceedings..

[24]  Marcus A. Magnor,et al.  Joint 3D-reconstruction and background separation in multiple views using graph cuts , 2003, 2003 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2003. Proceedings..

[25]  Ian D. Reid,et al.  A Multiple View Layered Representation for Dynamic Novel View Synthesis , 2003, BMVC.

[26]  Yuichi Ohta,et al.  Live 3D Video in Soccer Stadium , 2003, SIGGRAPH '03.

[27]  Edmond Boyer,et al.  Exact polyhedral visual hulls , 2003, BMVC.

[28]  Francis Schmitt,et al.  Silhouette and stereo fusion for 3D object modeling , 2003, Fourth International Conference on 3-D Digital Imaging and Modeling, 2003. 3DIM 2003. Proceedings..

[29]  Richard Szeliski,et al.  High-quality video view interpolation using a layered representation , 2004, SIGGRAPH 2004.

[30]  R. Zabih,et al.  What energy functions can be minimized via graph cuts , 2004 .

[31]  Vladimir Kolmogorov,et al.  An experimental comparison of min-cut/max- flow algorithms for energy minimization in vision , 2001, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[32]  Vladimir Kolmogorov,et al.  "GrabCut": interactive foreground extraction using iterated graph cuts , 2004, ACM Trans. Graph..

[33]  Richard Szeliski,et al.  Stereo Matching with Transparency and Matting , 1999, International Journal of Computer Vision.

[34]  Steven M. Seitz,et al.  Photorealistic Scene Reconstruction by Voxel Coloring , 1997, International Journal of Computer Vision.

[35]  Kiriakos N. Kutulakos,et al.  A Theory of Shape by Space Carving , 2000, International Journal of Computer Vision.

[36]  Harry Shum,et al.  Video object cut and paste , 2005, ACM Trans. Graph..

[37]  Edmond Boyer,et al.  Fusion of multiview silhouette cues using a space occupancy grid , 2005, Tenth IEEE International Conference on Computer Vision (ICCV'05) Volume 1.

[38]  Roberto Cipolla,et al.  Multi-view stereo via volumetric graph-cuts , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[39]  Adrian Hilton,et al.  Virtual view synthesis of people from multiple view video sequences , 2005, Graph. Model..

[40]  Cordelia Schmid,et al.  A Comparison of Affine Region Detectors , 2005, International Journal of Computer Vision.

[41]  Cordelia Schmid,et al.  A performance evaluation of local descriptors , 2005, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[42]  Marc Pollefeys,et al.  Multi-view reconstruction using photo-consistency and exact silhouette constraints: a maximum-flow formulation , 2005, Tenth IEEE International Conference on Computer Vision (ICCV'05) Volume 1.

[43]  Michael M. Kazhdan,et al.  Poisson surface reconstruction , 2006, SGP '06.

[44]  Harry Shum,et al.  Background Cut , 2006, ECCV.

[45]  Richard Szeliski,et al.  A Comparison and Evaluation of Multi-View Stereo Reconstruction Algorithms , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[46]  Andrew Blake,et al.  Probabilistic Fusion of Stereo with Color and Contrast for Bilayer Segmentation , 2006, IEEE Trans. Pattern Anal. Mach. Intell..

[47]  Saito,et al.  Player viewpoint video synthesis using multiple cameras , 2006 .

[48]  Harry Shum,et al.  Image-based rendering , 2006, Found. Trends Comput. Graph. Vis..

[49]  Michael Goesele,et al.  Multi-View Stereo Revisited , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[50]  Adrian Hilton,et al.  Surface Capture for Performance-Based Animation , 2007, IEEE Computer Graphics and Applications.

[51]  Roberto Cipolla,et al.  Multiview Stereo via Volumetric Graph-Cuts and Occlusion Robust Photo-Consistency , 2007, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[52]  J. Starck,et al.  A Robust Free-Viewpoint Video System for Sport Scenes , 2007, 2007 3DTV Conference.

[53]  Oliver Grau,et al.  A Bayesian Framework for Simultaneous Matting and 3D Reconstruction , 2007, Sixth International Conference on 3-D Digital Imaging and Modeling (3DIM 2007).

[54]  Leif Kobbelt,et al.  A Surface-Growing Approach to Multi-View Stereo Reconstruction , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[55]  Oliver Grau,et al.  Dual-Mode Deformable Models for Free-Viewpoint Video of Sports Events , 2007, Sixth International Conference on 3-D Digital Imaging and Modeling (3DIM 2007).

[56]  Graham A. Thomas,et al.  Real-time camera tracking using sports pitch markings , 2007, Journal of Real-Time Image Processing.

[57]  Pushmeet Kohli,et al.  Dynamic Graph Cuts for Efficient Inference in Markov Random Fields , 2007, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[58]  Markus H. Gross,et al.  3D Video Billboard Clouds , 2007, Comput. Graph. Forum.

[59]  Hideo Saito,et al.  Virtual Viewpoint Replay for a Soccer Match by View Interpolation From Multiple Cameras , 2007, IEEE Transactions on Multimedia.

[60]  Adrian Hilton,et al.  Wand-based Multiple Camera Studio Calibration , 2007 .

[61]  Dani Lischinski,et al.  A Closed-Form Solution to Natural Image Matting , 2006, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[62]  Pushmeet Kohli,et al.  Reduce, reuse & recycle: Efficiently solving multi-label MRFs , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[63]  Derek Bradley,et al.  Accurate multi-view reconstruction using robust binocular stereo and surface meshing , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[64]  Roberto Cipolla,et al.  Using Multiple Hypotheses to Improve Depth-Maps for Multi-View Stereo , 2008, ECCV.

[65]  Anita Sellent,et al.  Floating Textures , 2008, Comput. Graph. Forum.

[66]  Richard Szeliski,et al.  A Comparative Study of Energy Minimization Methods for Markov Random Fields with Smoothness-Based Priors , 2008, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[67]  Jean-Yves Guillemaut,et al.  Objective Quality Assessment in Free-Viewpoint Video Production , 2008, 2008 3DTV Conference: The True Vision - Capture, Transmission and Display of 3D Video.

[68]  Jean-Yves Guillemaut,et al.  Robust graph-cut scene segmentation and reconstruction for free-viewpoint video of complex dynamic scenes , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[69]  Qionghai Dai,et al.  Continuous depth estimation for multi-view stereo , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[70]  Guillermo Sapiro,et al.  Video SnapCut: robust video object cutout using localized classifiers , 2009, SIGGRAPH 2009.

[71]  Pieter Peers,et al.  Dynamic shape capture using multi-view photometric stereo , 2009, ACM Trans. Graph..

[72]  Jovan Popović,et al.  Dynamic shape capture using multi-view photometric stereo , 2009, SIGGRAPH 2009.

[73]  M. Pollefeys,et al.  Unstructured video-based rendering: interactive exploration of casually captured videos , 2010, ACM Trans. Graph..

[74]  Jean Ponce,et al.  Accurate, Dense, and Robust Multiview Stereopsis , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[75]  Michael Schmeing,et al.  A Closed Form Solution to Natural Image Matting , 2010 .

[76]  Roberto Cipolla,et al.  Automatic 3D object segmentation in multiple views using volumetric graph-cuts , 2007, Image Vis. Comput..

[77]  Markus H. Gross,et al.  Articulated Billboards for Video‐based Rendering , 2010, Comput. Graph. Forum.