Development of MPEG standards for 3D and free viewpoint video

An overview of 3D and free viewpoint video is given in this paper with special focus on related standardization activities in MPEG. Free viewpoint video allows the user to freely navigate within real world visual scenes, as known from virtual worlds in computer graphics. Suitable 3D scene representation formats are classified and the processing chain is explained. Examples are shown for image-based and model-based free viewpoint video systems, highlighting standards conform realization using MPEG-4. Then the principles of 3D video are introduced providing the user with a 3D depth impression of the observed scene. Example systems are described again focusing on their realization based on MPEG-4. Finally multi-view video coding is described as a key component for 3D and free viewpoint video systems. MPEG is currently working on a new standard for multi-view video coding. The conclusion is that the necessary technology including standard media formats for 3D and free viewpoint is available or will be available in the near future, and that there is a clear demand from industry and user side for such applications. 3DTV at home and free viewpoint video on DVD will be available soon, and will create huge new markets.

[1]  Ulrich Neumann,et al.  Immersive panoramic video , 2000, ACM Multimedia.

[2]  William E. Lorensen,et al.  Marching cubes: a high resolution 3D surface construction algorithm , 1996 .

[3]  Toshiaki Fujii,et al.  Free viewpoint TV system based on ray-space representation , 2002, SPIE ITCom.

[4]  Markus H. Gross,et al.  3D video fragments: dynamic point samples for real-time free-viewpoint video , 2004, Comput. Graph..

[5]  Michael Bosse,et al.  Unstructured lumigraph rendering , 2001, SIGGRAPH.

[6]  Richard Szeliski,et al.  Layered depth images , 1998, SIGGRAPH.

[7]  Hans-Peter Seidel,et al.  Free-viewpoint video of human actors , 2003, ACM Trans. Graph..

[8]  Markus H. Gross,et al.  3D Video Recorder: a System for Recording and Playing Free‐Viewpoint Video † , 2003, Comput. Graph. Forum.

[9]  Markus H. Gross,et al.  Unconstrained free-viewpoint video coding , 2004, 2004 International Conference on Image Processing, 2004. ICIP '04..

[10]  David Salesin,et al.  Surface light fields for 3D photography , 2000, SIGGRAPH.

[11]  Marc Levoy,et al.  Light field rendering , 1996, SIGGRAPH.

[12]  Richard Szeliski,et al.  Creating full view panoramic image mosaics and texture-mapped models , 1997, International Conference on Computer Graphics and Interactive Techniques.

[13]  Jitendra Malik,et al.  Modeling and Rendering Architecture from Photographs: A hybrid geometry- and image-based approach , 1996, SIGGRAPH.

[14]  Masayuki Tanimoto Free Viewpoint Television (FTV) , 2007 .

[15]  Takashi Matsuyama,et al.  Generation, visualization, and editing of 3D video , 2002, Proceedings. First International Symposium on 3D Data Processing Visualization and Transmission.

[16]  Wei-Chao Chen,et al.  Light field mapping: efficient representation and hardware rendering of surface light fields , 2002, SIGGRAPH.

[17]  Tsuhan Chen,et al.  Compression with mosaic prediction for image-based rendering applications , 2000, 2000 IEEE International Conference on Multimedia and Expo. ICME2000. Proceedings. Latest Advances in the Fast Changing World of Multimedia (Cat. No.00TH8532).

[18]  Hans-Peter Seidel,et al.  Hardware-Accelerated Visual Hull Reconstruction and Rendering , 2003, Graphics Interface.

[19]  Christoph Fehn,et al.  3D TV Broadcasting , 2006 .

[20]  Aljoscha Smolic,et al.  3-D reconstruction of a dynamic environment with a fully calibrated background for traffic scenes , 2005, IEEE Transactions on Circuits and Systems for Video Technology.

[21]  Gabriel Taubin,et al.  Curve and surface smoothing without shrinkage , 1995, Proceedings of IEEE International Conference on Computer Vision.

[22]  Shenchang Eric Chen,et al.  QuickTime VR: an image-based approach to virtual environment navigation , 1995, SIGGRAPH.

[23]  Harry Shum,et al.  Virtual reality using the concentric mosaic: construction, rendering and data compression , 2000, Proceedings 2000 International Conference on Image Processing (Cat. No.00CH37101).

[24]  Marcus A. Magnor,et al.  Multi-view coding for image-based rendering using 3-D scene geometry , 2003, IEEE Trans. Circuits Syst. Video Technol..

[25]  Toshiaki Fujii,et al.  Real-time view interpolation system for super multiview 3D display , 2001, IS&T/SPIE Electronic Imaging.

[26]  Wojciech Matusik,et al.  Polyhedral Visual Hulls for Real-Time Rendering , 2001, Rendering Techniques.

[27]  A. Laurentini,et al.  The Visual Hull Concept for Silhouette-Based Image Understanding , 1994, IEEE Trans. Pattern Anal. Mach. Intell..

[28]  Jin Li,et al.  Compression of Lumigraph with multiple reference frame (MRF) prediction and just-in-time rendering , 2000, Proceedings DCC 2000. Data Compression Conference.

[29]  Xiaojun Wu,et al.  Real-time dynamic 3-D object shape reconstruction and high-fidelity texture mapping for 3-D video , 2004, IEEE Transactions on Circuits and Systems for Video Technology.

[30]  Kiriakos N. Kutulakos,et al.  A Theory of Shape by Space Carving , 2000, International Journal of Computer Vision.

[31]  Aljoscha Smolic,et al.  3DAV exploration of video-based rendering technology in MPEG , 2004, IEEE Transactions on Circuits and Systems for Video Technology.

[32]  Steven M. Seitz,et al.  Photorealistic Scene Reconstruction by Voxel Coloring , 1997, International Journal of Computer Vision.

[33]  Richard Szeliski,et al.  High-quality video view interpolation using a layered representation , 2004, SIGGRAPH 2004.

[34]  Richard Szeliski,et al.  The geometry-image representation tradeoff for rendering , 2000, Proceedings 2000 International Conference on Image Processing (Cat. No.00CH37101).

[35]  Harry Shum,et al.  Rendering with concentric mosaics , 1999, SIGGRAPH.

[36]  Peter Eisert,et al.  Representation, coding, and rendering of 3D video objects with MPEG-4 and H.264/AVC , 2004, IEEE 6th Workshop on Multimedia Signal Processing, 2004..

[37]  Harry Shum,et al.  On the compression of image based rendering scene , 2000, Proceedings 2000 International Conference on Image Processing (Cat. No.00CH37101).

[38]  Ramesh Raskar,et al.  Image-based visual hulls , 2000, SIGGRAPH.

[39]  Marc Pollefeys,et al.  An evolutionary and optimised approach on 3D-TV , 2002 .

[40]  Marius Preda ISO/IEC 14496-16/PDAM1 , 2004 .

[41]  Richard Szeliski,et al.  The lumigraph , 1996, SIGGRAPH.

[42]  Peter Eisert,et al.  Automatic reconstruction of stationary 3-D objects from multiple uncalibrated camera views , 2000, IEEE Trans. Circuits Syst. Video Technol..

[43]  Wojciech Matusik,et al.  3D TV: a scalable system for real-time acquisition, transmission, and autostereoscopic display of dynamic scenes , 2004, ACM Trans. Graph..

[44]  Aljoscha Smolic,et al.  Efficient representation and interactive streaming of high-resolution panoramic views , 2002, Proceedings. International Conference on Image Processing.

[45]  Toshiaki Fujii,et al.  Interpolation of ray-space data by adaptive filtering , 2000, Electronic Imaging.

[46]  Peter Eisert,et al.  Predictive compression of dynamic 3D meshes , 2005, IEEE International Conference on Image Processing 2005.

[47]  K. Mueller CODING OF 3D MESHES AND VIDEO TEXTURES FOR 3D VIDEO OBJECTS , 2004 .

[48]  Aljoscha Smolic,et al.  Interactive 3-D Video Representation and Coding Technologies , 2005, Proceedings of the IEEE.

[49]  Roger Y. Tsai,et al.  A versatile camera calibration technique for high-accuracy 3D machine vision metrology using off-the-shelf TV cameras and lenses , 1987, IEEE J. Robotics Autom..

[50]  Marcus A. Magnor,et al.  Data compression for light-field rendering , 2000, IEEE Trans. Circuits Syst. Video Technol..