Virtual view synthesis of people from multiple view video sequences

This paper addresses the synthesis of novel views of people from multiple view video. We consider the target area of the multiple camera 3D Virtual Studio for broadcast production with the requirement for free-viewpoint video synthesis for a virtual camera with the same quality as captured video. A framework is introduced for view-dependent optimisation of reconstructed surface shape to align multiple captured images with sub-pixel accuracy for rendering novel views. View-dependent shape optimisation combines multiple view stereo and silhouette constraints to robustly estimate correspondence between images in the presence of visual ambiguities such as uniform surface regions, self-occlusion, and camera calibration error. Free-viewpoint rendering of video sequences of people achieves a visual quality comparable to the captured video images. Experimental evaluation demonstrates that this approach overcomes limitations of previous stereo- and silhouette-based approaches to rendering novel views of moving people.

[1]  Demetri Terzopoulos,et al.  The Computation of Visible-Surface Representations , 1988, IEEE Trans. Pattern Anal. Mach. Intell..

[2]  Takeo Kanade,et al.  Spatio-Temporal View Interpolation , 2002, Rendering Techniques.

[3]  Ramesh Raskar,et al.  Image-based visual hulls , 2000, SIGGRAPH.

[4]  Wojciech Matusik,et al.  Polyhedral Visual Hulls for Real-Time Rendering , 2001, Rendering Techniques.

[5]  David Salesin,et al.  Synthesizing realistic facial expressions from photographs , 1998, SIGGRAPH.

[6]  Saied Moezzi,et al.  Virtual View Generation for 3D Digital Video , 1997, IEEE Multim..

[7]  Takashi Matsuyama,et al.  Generation, visualization, and editing of 3D video , 2002, Proceedings. First International Symposium on 3D Data Processing Visualization and Transmission.

[8]  Sergei Nirenburg,et al.  Generation , 2004, Machine Translation.

[9]  Takeo Kanade,et al.  Constructing virtual worlds using dense stereo , 1998, Sixth International Conference on Computer Vision (IEEE Cat. No.98CH36271).

[10]  Adrian Hilton,et al.  Model-based multiple view reconstruction of people , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[11]  C. Dyer Volumetric Scene Reconstruction from Multiple Views , 2001 .

[12]  A. Laurentini,et al.  The Visual Hull Concept for Silhouette-Based Image Understanding , 1994, IEEE Trans. Pattern Anal. Mach. Intell..

[13]  Kiriakos N. Kutulakos,et al.  A Theory of Shape by Space Carving , 2000, International Journal of Computer Vision.

[14]  Pascal Fua,et al.  Articulated Soft Objects for Video-based Body Modeling , 2001, ICCV.

[15]  Linda G. Shapiro,et al.  View-base Rendering: Visualizing Real Objects from Scanned Range and Color Data , 1997, Rendering Techniques.

[16]  Hans-Peter Seidel,et al.  Free-viewpoint video of human actors , 2003, ACM Trans. Graph..

[17]  Oliver Grau,et al.  3D image sequence acquisition for TV & film production , 2002, Proceedings. First International Symposium on 3D Data Processing Visualization and Transmission.

[18]  Steven M. Seitz,et al.  Photorealistic Scene Reconstruction by Voxel Coloring , 1997, International Journal of Computer Vision.

[19]  Richard Szeliski,et al.  High-quality video view interpolation using a layered representation , 2004, SIGGRAPH 2004.

[20]  Emanuele Trucco,et al.  Rectification with unconstrained stereo geometry , 1997, BMVC.

[21]  Bernhard P. Wrobel,et al.  Multiple View Geometry in Computer Vision , 2001 .

[22]  Michael Bosse,et al.  Unstructured lumigraph rendering , 2001, SIGGRAPH.

[23]  Stan Sclaroff,et al.  Stochastic refinement of the visual hull to satisfy photometric and silhouette consistency constraints , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[24]  William E. Lorensen,et al.  Marching cubes: A high resolution 3D surface construction algorithm , 1987, SIGGRAPH.

[25]  Hans-Peter Seidel,et al.  Interactive multi-resolution modeling on arbitrary meshes , 1998, SIGGRAPH.

[26]  Andrew Zisserman,et al.  Multiple view geometry in computer visiond , 2001 .

[27]  Michael S. Landy,et al.  Computational models of visual processing , 1991 .

[28]  Marcus A. Magnor,et al.  Space-time isosurface evolution for temporally coherent 3D reconstruction , 2004, Proceedings of the 2004 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2004. CVPR 2004..

[29]  Yizhou Yu,et al.  Efficient View-Dependent Image-Based Rendering with Projective Texture-Mapping , 1998, Rendering Techniques.

[30]  Michael I. Jordan Graphical Models , 1998 .

[31]  David Salesin,et al.  Surface light fields for 3D photography , 2000, SIGGRAPH.

[32]  Takeo Kanade,et al.  Appearance-based virtual view generation of temporally-varying events from multi-camera images in the 3D room , 1999, Second International Conference on 3-D Digital Imaging and Modeling (Cat. No.PR00062).

[33]  Oliver Grau,et al.  Real-Time Production and Delivery of 3D Media , 2002 .

[34]  Takeo Kanade,et al.  Virtualized Reality: Constructing Virtual Worlds from Real Scenes , 1997, IEEE Multim..

[35]  Richard Szeliski,et al.  Rapid octree construction from image sequences , 1993 .

[36]  Jitendra Malik,et al.  Modeling and Rendering Architecture from Photographs: A hybrid geometry- and image-based approach , 1996, SIGGRAPH.

[37]  Pascal Fua,et al.  Object-centered surface reconstruction: Combining multi-image stereo and shading , 1995, International Journal of Computer Vision.