Depth-layer-based multiview image synthesis and coding for interactive z- and x-dimension view switching

In an interactive multiview image navigation system, a user requests switches to adjacent views as he observes the static 3D scene from different viewpoints. In response, the server transmits encoded data to enable client- side decoding and rendering of the requested viewpoint images. It is clear that there exists correlation between consecutive requested viewpoint images that can be exploited to lower transmission rate. In previous works, this is done using a pixel-based synthesis and coding approach for view-switch along the x-dimension (horizontal camera motion): given texture and depth maps of the previous view, texture pixels are individually shifted horizontally to the newly requested view, each according to its disparity value, via depth-image-based rendering (DIBR). Unknown pixels in the disoccluded region in the new view (pixels not visible in the previous view) are either inpainted, or intra-coded and transmitted by server for reconstruction at decoder. In this paper, to enable efficient view-switch along the z-dimension (camera motion into / out of the scene), we propose an alternative layer-based synthesis and coding approach. Specifically, we first divide each multiview image into depth layers, where adjacent pixels with similar depth values are grouped to the same layer. During a view-switch into the scene, spatial region in a layer is enlarged via super-resolution, where the scale factor is determined by the distance between the layer and the camera. On the other hand, during a view-switch out of the scene, spatial region in a layer is shrunk via low-pass filtering and down-sampling. Due to high quality reconstruction of depth layers in the new view via rescaling, coding and transmission of a depth layer in the new view by server is necessary only in the rare case when the quality of layer-based reconstruction is poor, saving transmission rate. Experiments show that our layer-based approach can reduce bit-rate by up to 35% compared to previous pixel-based approach.

[1]  Ajay Luthra,et al.  Overview of the H.264/AVC video coding standard , 2003, IEEE Trans. Circuits Syst. Video Technol..

[2]  Toshiaki Fujii,et al.  Multipoint Measuring System for Video and Sound - 100-camera and microphone system , 2006, 2006 IEEE International Conference on Multimedia and Expo.

[3]  Yo-Sung Ho,et al.  Hole filling method using depth based in-painting for view synthesis in free viewpoint television and 3-D video , 2009, 2009 Picture Coding Symposium.

[4]  Antonio Ortega,et al.  Distributed source coding techniques for interactive multiview video streaming , 2009, 2009 Picture Coding Symposium.

[5]  S. Burak Gokturk,et al.  A Time-Of-Flight Depth Sensor - System Description, Issues and Solutions , 2004, 2004 Conference on Computer Vision and Pattern Recognition Workshop.

[6]  Stephan Reichelt,et al.  Depth cues in human visual perception and their realization in 3D displays , 2010, Defense + Commercial Sensing.

[7]  Zhaozheng Yin,et al.  Improving depth perception with motion parallax and its application in teleconferencing , 2009, 2009 IEEE International Workshop on Multimedia Signal Processing.

[8]  Leonard McMillan,et al.  Post-rendering 3D warping , 1997, SI3D.

[9]  Aljoscha Smolic,et al.  Multi-View Video Plus Depth Representation and Coding , 2007, 2007 IEEE International Conference on Image Processing.

[10]  Thomas Maugey,et al.  R-D optimized auxiliary information for inpainting-based view synthesis , 2012, 2012 3DTV-Conference: The True Vision - Capture, Transmission and Display of 3D Video (3DTV-CON).

[11]  Wei Cai,et al.  Optimized frame structure for interactive light field streaming with cooperative caching , 2011, 2011 IEEE International Conference on Multimedia and Expo.