Depth Map Estimation for Free-Viewpoint Television and Virtual Navigation

The paper presents a new method of depth estimation, dedicated for free-viewpoint television (FTV) and virtual navigation (VN). In this method, multiple arbitrarily positioned input views are simultaneously used to produce depth maps characterized by high inter-view and temporal consistencies. The estimation is performed for segments and their size is used to control the trade-off between the quality of depth maps and the processing time of depth estimation. Additionally, an original technique is proposed for the improvement of temporal consistency of depth maps. This technique uses the temporal prediction of depth, thus depth is estimated for P-type depth frames. For such depth frames, temporal consistency is high, whereas estimation complexity is relatively low. Similarly, as for video coding, I-type depth frames with no temporal depth prediction are used in order to achieve robustness. Moreover, we propose a novel parallelization technique that significantly reduces the estimation time. The method is implemented in C++ software that is provided together with this paper, so other researchers may use it as a new reference for their future works. In performed experiments, MPEG methodology was used whenever possible. The provided results demonstrate the advantages over the Depth Estimation Reference Software (DERS) developed by MPEG. The fidelity of a depth map, measured by the quality of synthesized views, is higher on average by 2.6 dB. This significant quality improvement is obtained despite a significant reduction of the estimation time, on average 4.5 times. The application of the proposed temporal consistency enhancement method increases this reduction to 29 times. Moreover, the proposed parallelization results in the reduction of the estimation time up to 130 times (using 6 threads). As there is no commonly accepted measure of the consistency of depth maps, the application of compression efficiency of depth is proposed as a measure of depth consistency.

[1]  Qian Huang,et al.  Light-Field Depth Estimation via Epipolar Plane Image Analysis and Locally Linear Embedding , 2017, IEEE Transactions on Circuits and Systems for Video Technology.

[2]  Gauthier Lafruit,et al.  Multi-view wide baseline depth estimation robust to sparse input sampling , 2016, 2016 3DTV-Conference: The True Vision - Capture, Transmission and Display of 3D Video (3DTV-CON).

[3]  Masayuki Tanimoto FTV standardization in MPEG , 2014, 2014 3DTV-Conference: The True Vision - Capture, Transmission and Display of 3D Video (3DTV-CON).

[4]  William T. Freeman,et al.  Comparison of graph cuts with belief propagation for stereo, using identical MRF parameters , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[5]  Yo-Sung Ho,et al.  High-quality multi-view depth generation using multiple color and depth cameras , 2010, 2010 IEEE International Conference on Multimedia and Expo.

[6]  Takanori Senoh,et al.  New visual coding exploration in MPEG: Super-MultiView and Free Navigation in Free viewpoint TV , 2016, SD&A.

[7]  Krzysztof Wegner,et al.  Analysis of noise in multi-camera systems , 2014, 2014 3DTV-Conference: The True Vision - Capture, Transmission and Display of 3D Video (3DTV-CON).

[8]  Qing Wang,et al.  Occlusion-Model Guided Antiocclusion Depth Estimation in Light Field , 2016, IEEE Journal of Selected Topics in Signal Processing.

[9]  Vladimir Kolmogorov,et al.  Multi-camera Scene Reconstruction via Graph Cuts , 2002, ECCV.

[10]  Kunal Swami,et al.  DISCO: Depth Inference from Stereo using Context , 2019, 2019 IEEE International Conference on Multimedia and Expo (ICME).

[11]  Michael Goesele,et al.  Multi-frame stereo matching with edges, planes, and superpixels , 2019, Image Vis. Comput..

[12]  Takanori Senoh,et al.  Depth Estimation and View Synthesis for Immersive Media , 2018, 2018 International Conference on 3D Immersion (IC3D).

[13]  Feng Wu,et al.  Estimation of Virtual View Synthesis Distortion Toward Virtual View Position , 2016, IEEE Transactions on Image Processing.

[14]  Gauthier Lafruit,et al.  Multiview Video: Acquisition, Processing, Compression and Virtual View Rendering , 2017 .

[15]  Petros Daras,et al.  Temporal and color consistent disparity estimation in stereo videos , 2014, 2014 IEEE International Conference on Image Processing (ICIP).

[16]  Zixiang Xiong,et al.  A gradient-based approach for interference cancelation in systems with multiple Kinect cameras , 2013, 2013 IEEE International Symposium on Circuits and Systems (ISCAS2013).

[17]  Richard Szeliski,et al.  High-quality video view interpolation using a layered representation , 2004, SIGGRAPH 2004.

[18]  Xukun Shen,et al.  PM-PM: PatchMatch With Potts Model for Object Segmentation and Stereo Matching , 2015, IEEE Transactions on Image Processing.

[19]  Li Hong,et al.  Segment-based stereo matching using graph cuts , 2004, Proceedings of the 2004 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2004. CVPR 2004..

[20]  Krzysztof Wegner,et al.  Estimation of temporally-consistent depth maps from video with reduced noise , 2015, 2015 3DTV-Conference: The True Vision - Capture, Transmission and Display of 3D Video (3DTV-CON).

[21]  Kenji Tashiro,et al.  Free viewpoint video (FVV) survey and future research direction , 2015, APSIPA Transactions on Signal and Information Processing.

[22]  Javier Civera,et al.  Single-View and Multiview Depth Fusion , 2016, IEEE Robotics and Automation Letters.

[23]  Qiang Wu,et al.  A Coarse-to-Fine Algorithm for Matching and Registration in 3D Cross-Source Point Clouds , 2018, IEEE Transactions on Circuits and Systems for Video Technology.

[24]  Richard Szeliski,et al.  High-accuracy stereo depth maps using structured light , 2003, 2003 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2003. Proceedings..

[25]  Gaochang Wu,et al.  Joint view synthesis and disparity refinement for stereo matching , 2019, Frontiers of Computer Science.

[26]  Toshiaki Fujii,et al.  Free-viewpoint video synthesis from mixed resolution multi-view images and low resolution depth maps , 2015, Electronic Imaging.

[27]  Dah-Jye Lee,et al.  Review of stereo vision algorithms and their suitability for resource-limited systems , 2013, Journal of Real-Time Image Processing.

[28]  Jing Wang,et al.  Segment-based adaptive window and multi-feature fusion for stereo matching , 2016 .

[29]  Wojciech Matusik,et al.  Data Driven 2-D-to-3-D Video Conversion for Soccer , 2018, IEEE Transactions on Multimedia.

[30]  Krzysztof Wegner,et al.  Demonstration of a simple free viewpoint television system , 2017, 2017 IEEE International Conference on Image Processing (ICIP).

[31]  Torsten Sattler,et al.  A Multi-view Stereo Benchmark with High-Resolution Images and Multi-camera Videos , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[32]  Richard P. Wildes,et al.  Spatiotemporal Stereo and Scene Flow via Stequel Matching , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[33]  Xi Wang,et al.  High-Resolution Stereo Datasets with Subpixel-Accurate Ground Truth , 2014, GCPR.

[34]  Hujun Bao,et al.  Consistent Depth Maps Recovery from a Video Sequence , 2009, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[35]  Peter Eisert,et al.  Real-time generation of multi-view video plus depth content using mixed narrow and wide baseline , 2014, J. Vis. Commun. Image Represent..

[36]  Krzysztof Wegner,et al.  Poznan University of Technology test multiview video sequences acquired with circular camera arrangement – “Poznan Team” and “Poznan Blocks” sequences , 2015 .

[37]  Jianjun Lei,et al.  Depth Map Super-Resolution Considering View Synthesis Quality , 2017, IEEE Transactions on Image Processing.

[38]  Rafael Arnay,et al.  Using Kinect on an Autonomous Vehicle for Outdoors Obstacle Detection , 2016, IEEE Sensors Journal.

[39]  A. M. Kondoz,et al.  Impact of depth map spatial resolution on 3D video quality and depth perception , 2010, 2010 3DTV-Conference: The True Vision - Capture, Transmission and Display of 3D Video.

[40]  Krzysztof Wegner,et al.  Multiview synthesis — Improved view synthesis for virtual navigation , 2016, 2016 Picture Coding Symposium (PCS).

[41]  Anoop M. Namboodiri,et al.  Multiscale two-view stereo using convolutional neural networks for unrectified images , 2017, 2017 Fifteenth IAPR International Conference on Machine Vision Applications (MVA).

[42]  Thomas Wiegand,et al.  3-D Video Representation Using Depth Maps , 2011, Proceedings of the IEEE.

[43]  Luis Salgado,et al.  Depth-Color Fusion Strategy for 3-D Scene Modeling With Kinect , 2013, IEEE Transactions on Cybernetics.

[44]  G. Bjontegaard,et al.  Calculation of Average PSNR Differences between RD-curves , 2001 .

[45]  Thomas Brox,et al.  FlowNet 2.0: Evolution of Optical Flow Estimation with Deep Networks , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[46]  Li Zhang,et al.  PMSC: PatchMatch-Based Superpixel Cut for Accurate Stereo Matching , 2018, IEEE Transactions on Circuits and Systems for Video Technology.

[47]  Shuai Li,et al.  Hole Filling With Multiple Reference Views in DIBR View Synthesis , 2018, IEEE Transactions on Multimedia.

[48]  Gwo Giun Lee,et al.  Content-adaptive depth map enhancement based on motion distribution , 2014, 2014 IEEE Visual Communications and Image Processing Conference.

[49]  Olga Veksler,et al.  Fast Approximate Energy Minimization via Graph Cuts , 2001, IEEE Trans. Pattern Anal. Mach. Intell..

[50]  Olgierd Stankiewicz,et al.  A Free-Viewpoint Television System for Horizontal Virtual Navigation , 2018, IEEE Transactions on Multimedia.

[51]  Christine Guillemot,et al.  Depth Estimation with Occlusion Handling from a Sparse Set of Light Field Views , 2018, 2018 25th IEEE International Conference on Image Processing (ICIP).

[52]  Sabine Süsstrunk,et al.  Superpixels and Polygons Using Simple Non-iterative Clustering , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[53]  Toshiaki Fujii,et al.  FTV for 3-D Spatial Communication , 2012, Proceedings of the IEEE.

[54]  Truong Q. Nguyen,et al.  Multi-Array Camera Disparity Enhancement , 2014, IEEE Transactions on Multimedia.

[55]  Maojun Zhang,et al.  Fast semi-global stereo matching via extracting disparity candidates from region boundaries , 2011 .

[56]  Yao Zhao,et al.  Depth Map Driven Hole Filling Algorithm Exploiting Temporal Correlation Information , 2014, IEEE Transactions on Broadcasting.

[57]  Javier Ruiz Hidalgo,et al.  Real-Time Head and Hand Tracking Based on 2.5D Data , 2012 .

[58]  Krzysztof Wegner,et al.  A practical approach to acquisition and processing of free viewpoint video , 2015, 2015 Picture Coding Symposium (PCS).

[59]  Lu Fang,et al.  An Analytical Model for Synthesis Distortion Estimation in 3D Video , 2014, IEEE Transactions on Image Processing.

[60]  Rongke Liu,et al.  Accurate Depth Extraction Method for Multiple Light-Coding-Based Depth Cameras , 2017, IEEE Transactions on Multimedia.

[61]  Krzysztof Wegner,et al.  Immersive visual media — MPEG-I: 360 video, virtual navigation and beyond , 2017, 2017 International Conference on Systems, Signals and Image Processing (IWSSIP).

[62]  Marek Domanski,et al.  Graph-based multiview depth estimation using segmentation , 2017, 2017 IEEE International Conference on Multimedia and Expo (ICME).

[63]  Yo-Sung Ho,et al.  Depth upsampling methods for high resolution depth map , 2018, 2018 International Conference on Electronics, Information, and Communication (ICEIC).

[64]  Qiang Wu,et al.  Robust Color Guided Depth Map Restoration , 2017, IEEE Transactions on Image Processing.