Efficient virtual view rendering by merging pre-rendered RGB-D data from multiple cameras

A virtual view, or a free-viewpoint video/image, gives us a video/image of an object seen from an arbitrary view­point in a 3D space. One typical approach uses multiview RGB videos captured by multiple RGB cameras surrounding the ob­ject. Then, a virtual view is obtained by estimating its 3D shape from the videos. This approach has difficulty in accuracy and efficiency for estimating a 3D geometry from 2D images. Recently, an RGB-D (RGB-Depth) camera is available to capture an RGB video with a depth, which is the distance from the camera to the surface of an object, per pixel. Using an RGB-D camera, the 3D shape of an object surface can be directly obtained without esti­mating 3D from 2D. However, a single RGB-D camera captures only the 3D shape of the surface part that the camera faces. In this research, we propose a method to efficiently render a virtual view using multiple RGB-D cameras. In our method, the 3D shapes of different surface parts captured by the respective cam­eras are efficiently merged according to a virtual viewpoint. Each camera is connected to a PC, and all PCs are connected to each other for parallel processing in a PC cluster network. RGB-D data captured by the cameras have to be transferred via the net­work to merge. Our method effectively reduces the size of RGB-D data to transfer by "view-dependent pre-rendering", in which "imperfect" virtual views are rendered using original RGB-D data captured by the respective cameras on their PCs in parallel. This pre-rendering greatly contributes to real-time rendering of a final virtual view.

[1]  Gary J. Sullivan,et al.  Overview of the Stereo and Multiview Video Coding Extensions of the H.264/MPEG-4 AVC Standard , 2011, Proceedings of the IEEE.

[2]  Bernd Fröhlich,et al.  Hybrid Lossless-Lossy Compression for Real-Time Depth-Sensor Streams in 3D Telepresence Applications , 2015, PCM.

[3]  Richard Szeliski,et al.  The lumigraph , 1996, SIGGRAPH.

[4]  Xiaoqin Wang,et al.  Collaborative multi-sensor image transmission and data fusion in mobile visual sensor networks equipped with RGB-D cameras , 2016, 2016 IEEE International Conference on Multisensor Fusion and Integration for Intelligent Systems (MFI).

[5]  Francesca Cuomo,et al.  An Empirical Model of Multiview Video Coding Efficiency for Wireless Multimedia Sensor Networks , 2013, IEEE Transactions on Multimedia.

[6]  Ramesh Raskar,et al.  Image-based visual hulls , 2000, SIGGRAPH.

[7]  Leonard McMillan,et al.  Plenoptic Modeling: An Image-Based Rendering System , 2023 .

[8]  Emilio J. Almazan,et al.  Tracking People across Multiple Non-overlapping RGB-D Sensors , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition Workshops.

[9]  Marc Levoy,et al.  Light field rendering , 1996, SIGGRAPH.

[10]  Ahmed Khoumsi,et al.  A Survey of Image Compression Algorithms for Visual Sensor Networks , 2012 .

[11]  Kah Phooi Seng,et al.  Multiview Image Compression for Wireless Multimedia Sensor Network Using Image Stitching and SPIHT Coding with EZW Tree Structure , 2009, 2009 International Conference on Intelligent Human-Machine Systems and Cybernetics.

[12]  A. Laurentini,et al.  The Visual Hull Concept for Silhouette-Based Image Understanding , 1994, IEEE Trans. Pattern Anal. Mach. Intell..

[13]  Ju Shen,et al.  RGB-D Camera Network Calibration and Streaming for 3D Telepresence in Large Environment , 2017, 2017 IEEE Third International Conference on Multimedia Big Data (BigMM).

[14]  Lance Williams,et al.  View Interpolation for Image Synthesis , 1993, SIGGRAPH.

[15]  Steven M. Seitz,et al.  Photorealistic Scene Reconstruction by Voxel Coloring , 1997, International Journal of Computer Vision.

[16]  Richard Szeliski,et al.  High-quality video view interpolation using a layered representation , 2004, SIGGRAPH 2004.

[17]  Yang Bai,et al.  Feature-Based Image Comparison for Semantic Neighbor Selection in Resource-Constrained Visual Sensor Networks , 2010, EURASIP J. Image Video Process..

[18]  Jan Kautz,et al.  Adapting Standard Video Codecs for Depth Streaming , 2011, EGVE/EuroVR.

[19]  Wojciech Matusik,et al.  Polyhedral Visual Hulls for Real-Time Rendering , 2001, Rendering Techniques.