Free viewpoint video (FVV) survey and future research direction

Free viewpoint video (FVV) is one of the new trends in the development of advanced visual media type that aims to provide a new immersive user experience and interactivity that goes beyond higher image quality (HD/4K TV) and higher realism (3D TV). Potential applications include interactive personal visualization and free viewpoint navigation. The goal of this paper is to provide an overview of the FVV system and some target application scenarios. Associated standardization activities and technological barriers to overcome are also described. This paper is organized as follows: a general description of the FVV system and functionalities is given in Section I. Since an FVV system is composed of a chain of processing modules, an in-depth functional description of each module is provided in Section II. Examples of emerging FVV applications and use cases are given in Section III. A summary of technical challenges to overcome for wider usage and market penetration of FVV is given in Section IV.

[1]  Roberto Cipolla,et al.  Multi-view stereo via volumetric graph-cuts , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[2]  Hans-Peter Seidel,et al.  Performance capture from sparse multi-view video , 2008, ACM Trans. Graph..

[3]  Toshiaki Fujii,et al.  The Seelinder: Cylindrical 3D display viewable from 360 degrees , 2010, J. Vis. Commun. Image Represent..

[4]  Heung-Yeung Shum,et al.  Image-Based Rendering and Synthesis , 2007, IEEE Signal Processing Magazine.

[5]  A. Laurentini,et al.  The Visual Hull Concept for Silhouette-Based Image Understanding , 1994, IEEE Trans. Pattern Anal. Mach. Intell..

[6]  Jitendra Malik,et al.  Shape, Illumination, and Reflectance from Shading , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[7]  Christopher Hunt,et al.  Notes on the OpenSURF Library , 2009 .

[8]  Roberto Cipolla,et al.  A surface evolution approach of probabilistic space carving , 2002, Proceedings. First International Symposium on 3D Data Processing Visualization and Transmission.

[9]  Francis Schmitt,et al.  Silhouette and stereo fusion for 3D object modeling , 2003, Fourth International Conference on 3-D Digital Imaging and Modeling, 2003. 3DIM 2003. Proceedings..

[10]  Steven M. Seitz,et al.  Photorealistic Scene Reconstruction by Voxel Coloring , 1997, International Journal of Computer Vision.

[11]  Richard Szeliski,et al.  High-quality video view interpolation using a layered representation , 2004, SIGGRAPH 2004.

[12]  Steven M. Seitz,et al.  Photo tourism: exploring photo collections in 3D , 2006, ACM Trans. Graph..

[13]  Jitendra Malik,et al.  Intrinsic Scene Properties from a Single RGB-D Image , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[14]  Kiriakos N. Kutulakos,et al.  A Theory of Shape by Space Carving , 2000, International Journal of Computer Vision.

[15]  Jean Ponce,et al.  Accurate camera calibration from multi-view stereo and bundle adjustment , 2008, CVPR.

[16]  Yehezkel Yeshurun,et al.  Perspective shape-from-shading by fast marching , 2004, CVPR 2004.

[17]  Masayuki Tanimoto,et al.  FTV: Free-viewpoint Television , 2006, Signal Process. Image Commun..

[18]  David G. Lowe,et al.  Object recognition from local scale-invariant features , 1999, Proceedings of the Seventh IEEE International Conference on Computer Vision.

[19]  Thomas Lochmatter,et al.  Wireless GPS-based phase-locked synchronization system for outdoor environment. , 2012, Journal of biomechanics.

[20]  Jean Ponce,et al.  Carved Visual Hulls for Image-Based Modeling , 2006, ECCV.

[21]  Hans-Peter Seidel,et al.  Motion capture using joint skeleton tracking and surface estimation , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[22]  Michael J. Black,et al.  Estimating human shape and pose from a single image , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[23]  Toshiaki Fujii,et al.  Free-Viewpoint TV , 2011, IEEE Signal Processing Magazine.

[24]  Maja Pantic,et al.  Cost-effective solution to synchronised audio-visual data capture using multiple sensors , 2011, Image Vis. Comput..

[25]  Ramesh Raskar,et al.  Image-based visual hulls , 2000, SIGGRAPH.

[26]  Adrian Hilton,et al.  4D video textures for interactive character appearance , 2014, Comput. Graph. Forum.

[27]  Edmond Boyer,et al.  Efficient Polyhedral Modeling from Silhouettes , 2009, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[28]  Takashi Matsuyama,et al.  A Real-Time View-Dependent Shape Optimization for High Quality Free-Viewpoint Rendering of 3D Video , 2014, 2014 2nd International Conference on 3D Vision.

[29]  R. Basri,et al.  Statistical Symmetric Shape from Shading for 3D Structure Recovery of Faces , 2004, eccv 2004.

[30]  Montse Pardàs,et al.  Shape from inconsistent silhouette , 2008, Comput. Vis. Image Underst..

[31]  Anita Sellent,et al.  Floating Textures , 2008, Comput. Graph. Forum.

[32]  Dipl.-Ing,et al.  Real-time Rendering , 2022 .

[33]  Toshiaki Fujii,et al.  FTV for 3-D Spatial Communication , 2012, Proceedings of the IEEE.

[34]  Michael J. Black,et al.  Detailed Human Shape and Pose from Images , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[35]  Jean Ponce,et al.  Accurate, Dense, and Robust Multiview Stereopsis , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[36]  Takeo Kanade,et al.  Virtualized Reality: Perspectives on 4D Digitization of Dynamic Events , 2007, IEEE Computer Graphics and Applications.

[37]  Wojciech Matusik,et al.  Articulated mesh animation from multi-view silhouettes , 2008, ACM Trans. Graph..

[38]  Horst Bischof,et al.  Rapid Skin: Estimating the 3D Human Pose and Shape in Real-Time , 2012, 2012 Second International Conference on 3D Imaging, Modeling, Processing, Visualization & Transmission.

[39]  Sebastian Thrun,et al.  SCAPE: shape completion and animation of people , 2005, SIGGRAPH '05.

[40]  Steven M. Seitz,et al.  Occluding Contours for Multi-view Stereo , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[41]  Masayuki Tanimoto FTV (Free-viewpoint Television) for ray and sound reproducing in 3D space , 2012, 2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[42]  Marc Levoy,et al.  High performance imaging using large camera arrays , 2005, SIGGRAPH 2005.

[43]  Hans-Peter Seidel,et al.  A Statistical Model of Human Pose and Body Shape , 2009, Comput. Graph. Forum.

[44]  Takeo Kanade,et al.  Shape and motion from image streams under orthography: a factorization method , 1992, International Journal of Computer Vision.

[45]  Yuichi Iwadate,et al.  Method of 3D reconstruction using graph cuts, and its application to preserving intangible cultural heritage , 2009, 2009 IEEE 12th International Conference on Computer Vision Workshops, ICCV Workshops.

[46]  Aaron Hertzmann,et al.  Nonrigid Structure-from-Motion: Estimating Shape and Motion with Hierarchical Priors , 2008, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[47]  Gloria Haro Shape from Silhouette Consensus , 2012, Pattern Recognit..

[48]  Zicheng Liu,et al.  Tensor-Based Human Body Modeling , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[49]  Aljoscha Smolic,et al.  3D video and free viewpoint video - From capture to display , 2011, Pattern Recognit..