Encoder-Driven Inpainting Strategy in Multiview Video Compression

In free viewpoint video systems, a user has the freedom to select a virtual view from which an image of the 3D scene is rendered, and the scene is commonly represented by color and depth images of multiple nearby viewpoints. In such representation, there exists data redundancy across multiple dimensions: 1) a 3D voxel may be represented by pixels in multiple viewpoint images (inter-view redundancy); 2) a pixel patch may recur in a distant spatial region of the same image due to self-similarity (inter-patch redundancy); and 3) pixels in a local spatial region tend to be similar (inter-pixel redundancy). It is important to exploit these redundancies during inter-view prediction toward effective multiview video compression. In this paper, we propose an encoder-driven inpainting strategy for inter-view predictive coding, where explicit instructions are transmitted minimally, and the decoder is left to independently recover remaining missing data via inpainting, resulting in lower coding overhead. In particular, after pixels in a reference view are projected to a target view via depth-image-based rendering at the decoder, the remaining holes in the target view are filled via an inpainting process in a block-by-block manner. First, blocks are ordered in terms of difficulty-to-inpaint by the decoder. Then, explicit instructions are only sent for the reconstruction of the most difficult blocks. In particular, the missing pixels are explicitly coded via a graph Fourier transform or a sparsification procedure using discrete cosine transform, leading to low coding cost. For blocks that are easy to inpaint, the decoder independently completes missing pixels via template-based inpainting. We apply our proposed scheme to frames in a prediction structure defined by JCT-3V where inter-view prediction is dominant, and experimentally we show that our scheme achieves up to 3-dB gain in peak-signal-to-noise-ratio in reconstructed image quality over a comparable 3D-High Efficiency Video Coding implementation using fixed 16 $\times $ 16 block size.

[1]  Tony F. Chan,et al.  Nontexture Inpainting by Curvature-Driven Diffusions , 2001, J. Vis. Commun. Image Represent..

[2]  S. Burak Gokturk,et al.  A Time-Of-Flight Depth Sensor - System Description, Issues and Solutions , 2004, 2004 Conference on Computer Vision and Pattern Recognition Workshop.

[3]  Oscar C. Au,et al.  Depth map compression using multi-resolution graph-based transform for depth-image-based rendering , 2012, 2012 19th IEEE International Conference on Image Processing.

[4]  Richard I. Hartley,et al.  Theory and Practice of Projective Rectification , 1999, International Journal of Computer Vision.

[5]  Antonio Ortega,et al.  Intra-Prediction and Generalized Graph Fourier Transform for Image Coding , 2015, IEEE Signal Processing Letters.

[6]  Aidong Men,et al.  Depth map compression via edge-based inpainting , 2012, 2012 Picture Coding Symposium.

[7]  Aljoscha Smolic,et al.  Reliability-based generation and view synthesis in layered depth video , 2008, 2008 IEEE 10th Workshop on Multimedia Signal Processing.

[8]  Thomas Maugey,et al.  R-D optimized auxiliary information for inpainting-based view synthesis , 2012, 2012 3DTV-Conference: The True Vision - Capture, Transmission and Display of 3D Video (3DTV-CON).

[9]  Aidong Men,et al.  Intra prediction with enhanced inpainting method and vector predictor for HEVC , 2012, 2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[10]  Yan Luo,et al.  Stereo video coding based on frame estimation and interpolation , 2003, IEEE Trans. Broadcast..

[11]  Gary J. Sullivan,et al.  Overview of the High Efficiency Video Coding (HEVC) Standard , 2012, IEEE Transactions on Circuits and Systems for Video Technology.

[12]  Toshiaki Fujii,et al.  Free-Viewpoint TV , 2011, IEEE Signal Processing Magazine.

[13]  Dong Tian,et al.  View synthesis techniques for 3D video , 2009, Optical Engineering + Applications.

[14]  Jaejoon Lee,et al.  Edge-adaptive transforms for efficient depth map coding , 2010, 28th Picture Coding Symposium.

[15]  Peter H. N. de With,et al.  Multiview Depth-Image Compression Using an Extended H.264 Encoder , 2007, ACIVS.

[16]  Anthony Vetro,et al.  View Synthesis for Multiview Video Compression , 2006 .

[17]  David S. Johnson,et al.  Computers and Intractability: A Guide to the Theory of NP-Completeness , 1978 .

[18]  Heiko Schwarz,et al.  3D High-Efficiency Video Coding for Multi-View Video and Depth Data , 2013, IEEE Transactions on Image Processing.

[19]  I. Daubechies,et al.  Iteratively reweighted least squares minimization for sparse recovery , 2008, 0807.0575.

[20]  Hideaki Kimata,et al.  View Scalable Multiview Video Coding Using 3-D Warping With Depth Map , 2007, IEEE Transactions on Circuits and Systems for Video Technology.

[21]  Sehoon Yea,et al.  View synthesis prediction for multiview video coding , 2009, Signal Process. Image Commun..

[22]  Ajay Luthra,et al.  Overview of the H.264/AVC video coding standard , 2003, IEEE Trans. Circuits Syst. Video Technol..

[23]  Dong Liu,et al.  Image Compression With Edge-Based Inpainting , 2007, IEEE Transactions on Circuits and Systems for Video Technology.

[24]  Guillermo Sapiro,et al.  Navier-stokes, fluid dynamics, and image and video inpainting , 2001, Proceedings of the 2001 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. CVPR 2001.

[25]  Thiow Keng Tan,et al.  Intra Prediction by Template Matching , 2006, 2006 International Conference on Image Processing.

[26]  Hany S. Hussein,et al.  Blind configuration of multi-view video coder prediction structure , 2013, IEEE Transactions on Consumer Electronics.

[27]  Nanning Zheng,et al.  Stereo Matching Using Belief Propagation , 2002, IEEE Trans. Pattern Anal. Mach. Intell..

[28]  Sanjit K. Mitra,et al.  Low-delay rate control for DCT video coding via ?-domain source modeling , 2001, IEEE Trans. Circuits Syst. Video Technol..

[29]  Béatrice Pesquet-Popescu,et al.  Depth-aided image inpainting for novel view synthesis , 2010, 2010 IEEE International Workshop on Multimedia Signal Processing.

[30]  Zhaozheng Yin,et al.  Improving depth perception with motion parallax and its application in teleconferencing , 2009, 2009 IEEE International Workshop on Multimedia Signal Processing.

[31]  Marc Levoy,et al.  Fast texture synthesis using tree-structured vector quantization , 2000, SIGGRAPH.

[32]  Jean-Michel Morel,et al.  A non-local algorithm for image denoising , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[33]  Ying Chen,et al.  Standardized Extensions of High Efficiency Video Coding (HEVC) , 2013, IEEE Journal of Selected Topics in Signal Processing.

[34]  Guillermo Sapiro,et al.  Image inpainting , 2000, SIGGRAPH.

[35]  Thiow Keng Tan,et al.  Intra Prediction by Averaged Template Matching Predictors , 2007, 2007 4th IEEE Consumer Communications and Networking Conference.

[36]  Antonio Ortega,et al.  Depth map coding using graph based transform and transform domain sparsification , 2011, 2011 IEEE 13th International Workshop on Multimedia Signal Processing.

[37]  Gene Cheung,et al.  Joint texture-depth pixel inpainting of disocclusion holes in virtual view synthesis , 2013, 2013 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference.

[38]  Antonio Ortega,et al.  Transform domain sparsification of depth maps using iterative quadratic programming , 2011, 2011 18th IEEE International Conference on Image Processing.

[39]  Antonio Ortega,et al.  Depth map coding with distortion estimation of rendered view , 2010, Electronic Imaging.

[40]  Gene Cheung,et al.  Disocclusion hole-filling in DIBR-synthesized images using multi-scale template matching , 2014, 2014 IEEE Visual Communications and Image Processing Conference.

[41]  Aljoscha Smolic,et al.  Multi-View Video Plus Depth Representation and Coding , 2007, 2007 IEEE International Conference on Image Processing.

[42]  Oscar C. Au,et al.  Multiresolution Graph Fourier Transform for Compression of Piecewise Smooth Images , 2015, IEEE Transactions on Image Processing.

[43]  Antonio Ortega,et al.  Sparse representation of depth maps for efficient transform coding , 2010, 28th Picture Coding Symposium.

[44]  Ying Chen,et al.  The Emerging MVC Standard for 3D Video Services , 2008, EURASIP J. Adv. Signal Process..

[45]  Zhiwei Xiong,et al.  Block-Based Image Compression With Parameter-Assistant Inpainting , 2010, IEEE Transactions on Image Processing.

[46]  Markus Flierl,et al.  Motion and Disparity Compensated Coding for Multiview Video , 2007, IEEE Transactions on Circuits and Systems for Video Technology.

[47]  Aljoscha Smolic,et al.  Efficient Prediction Structures for Multiview Video Coding , 2007, IEEE Transactions on Circuits and Systems for Video Technology.

[48]  Mathias Wien,et al.  Extended Texture Prediction for H.264/AVC Intra Coding , 2007, 2007 IEEE International Conference on Image Processing.

[49]  Dong Liu,et al.  Intra Prediction via Edge-Based Inpainting , 2008, Data Compression Conference (dcc 2008).

[50]  Yang Xu,et al.  Advanced inpainting-based macroblock prediction with regularized structure propagation in video compression , 2010, 28th Picture Coding Symposium.

[51]  Guillermo Sapiro,et al.  Video Inpainting Under Constrained Camera Motion , 2007, IEEE Transactions on Image Processing.

[52]  Antonio Ortega,et al.  Bit allocation for dependent quantization with applications to multiresolution and MPEG video coders , 1994, IEEE Trans. Image Process..

[53]  Hideaki Kimata,et al.  MULTI-VIEW VIDEO CODING USING REFERENCE PICTURE SELECTION FOR FREE- VIEWPOINT VIDEO COMMUNICATION , 2004 .

[54]  Thomas Wiegand,et al.  Depth Image-Based Rendering With Advanced Texture Synthesis for 3-D Video , 2010, IEEE Transactions on Multimedia.

[55]  Patrick Pérez,et al.  Region filling and object removal by exemplar-based image inpainting , 2004, IEEE Transactions on Image Processing.

[56]  Pascal Frossard,et al.  The emerging field of signal processing on graphs: Extending high-dimensional data analysis to networks and other irregular domains , 2012, IEEE Signal Processing Magazine.