论文信息 - Efficient virtual view rendering by merging pre-rendered RGB-D data from multiple cameras

Efficient virtual view rendering by merging pre-rendered RGB-D data from multiple cameras

A virtual view, or a free-viewpoint video/image, gives us a video/image of an object seen from an arbitrary viewpoint in a 3D space. One typical approach uses multiview RGB videos captured by multiple RGB cameras surrounding the object. Then, a virtual view is obtained by estimating its 3D shape from the videos. This approach has difficulty in accuracy and efficiency for estimating a 3D geometry from 2D images. Recently, an RGB-D (RGB-Depth) camera is available to capture an RGB video with a depth, which is the distance from the camera to the surface of an object, per pixel. Using an RGB-D camera, the 3D shape of an object surface can be directly obtained without estimating 3D from 2D. However, a single RGB-D camera captures only the 3D shape of the surface part that the camera faces. In this research, we propose a method to efficiently render a virtual view using multiple RGB-D cameras. In our method, the 3D shapes of different surface parts captured by the respective cameras are efficiently merged according to a virtual viewpoint. Each camera is connected to a PC, and all PCs are connected to each other for parallel processing in a PC cluster network. RGB-D data captured by the cameras have to be transferred via the network to merge. Our method effectively reduces the size of RGB-D data to transfer by "view-dependent pre-rendering", in which "imperfect" virtual views are rendered using original RGB-D data captured by the respective cameras on their PCs in parallel. This pre-rendering greatly contributes to real-time rendering of a final virtual view.

Yusuke Sasaki | Tadahiro Fujimoto

[1] Gary J. Sullivan,et al. Overview of the Stereo and Multiview Video Coding Extensions of the H.264/MPEG-4 AVC Standard , 2011, Proceedings of the IEEE.

[2] Bernd Fröhlich,et al. Hybrid Lossless-Lossy Compression for Real-Time Depth-Sensor Streams in 3D Telepresence Applications , 2015, PCM.

[3] Richard Szeliski,et al. The lumigraph , 1996, SIGGRAPH.

[4] Xiaoqin Wang,et al. Collaborative multi-sensor image transmission and data fusion in mobile visual sensor networks equipped with RGB-D cameras , 2016, 2016 IEEE International Conference on Multisensor Fusion and Integration for Intelligent Systems (MFI).

[5] Francesca Cuomo,et al. An Empirical Model of Multiview Video Coding Efficiency for Wireless Multimedia Sensor Networks , 2013, IEEE Transactions on Multimedia.

[6] Ramesh Raskar,et al. Image-based visual hulls , 2000, SIGGRAPH.

[7] Leonard McMillan,et al. Plenoptic Modeling: An Image-Based Rendering System , 2023 .

[8] Emilio J. Almazan,et al. Tracking People across Multiple Non-overlapping RGB-D Sensors , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition Workshops.

[9] Marc Levoy,et al. Light field rendering , 1996, SIGGRAPH.

[10] Ahmed Khoumsi,et al. A Survey of Image Compression Algorithms for Visual Sensor Networks , 2012 .

[11] Kah Phooi Seng,et al. Multiview Image Compression for Wireless Multimedia Sensor Network Using Image Stitching and SPIHT Coding with EZW Tree Structure , 2009, 2009 International Conference on Intelligent Human-Machine Systems and Cybernetics.

[12] A. Laurentini,et al. The Visual Hull Concept for Silhouette-Based Image Understanding , 1994, IEEE Trans. Pattern Anal. Mach. Intell..

[13] Ju Shen,et al. RGB-D Camera Network Calibration and Streaming for 3D Telepresence in Large Environment , 2017, 2017 IEEE Third International Conference on Multimedia Big Data (BigMM).

[14] Lance Williams,et al. View Interpolation for Image Synthesis , 1993, SIGGRAPH.

[15] Steven M. Seitz,et al. Photorealistic Scene Reconstruction by Voxel Coloring , 1997, International Journal of Computer Vision.

[16] Richard Szeliski,et al. High-quality video view interpolation using a layered representation , 2004, SIGGRAPH 2004.

[17] Yang Bai,et al. Feature-Based Image Comparison for Semantic Neighbor Selection in Resource-Constrained Visual Sensor Networks , 2010, EURASIP J. Image Video Process..

[18] Jan Kautz,et al. Adapting Standard Video Codecs for Depth Streaming , 2011, EGVE/EuroVR.

[19] Wojciech Matusik,et al. Polyhedral Visual Hulls for Real-Time Rendering , 2001, Rendering Techniques.