Pushing the Boundaries of View Extrapolation With Multiplane Images

We explore the problem of view synthesis from a narrow baseline pair of images, and focus on generating high-quality view extrapolations with plausible disocclusions. Our method builds upon prior work in predicting a multiplane image (MPI), which represents scene content as a set of RGBA planes within a reference view frustum and renders novel views by projecting this content into the target viewpoints. We present a theoretical analysis showing how the range of views that can be rendered from an MPI increases linearly with the MPI disparity sampling frequency, as well as a novel MPI prediction procedure that theoretically enables view extrapolations of up to 4 times the lateral viewpoint movement allowed by prior work. Our method ameliorates two specific issues that limit the range of views renderable by prior methods: 1) We expand the range of novel views that can be rendered without depth discretization artifacts by using a 3D convolutional network architecture along with a randomized-resolution training procedure to allow our model to predict MPIs with increased disparity sampling frequency. 2) We reduce the repeated texture artifacts seen in disocclusions by enforcing a constraint that the appearance of hidden content at any depth must be drawn from visible content at or behind that depth.

[1]  Ting-Chun Wang,et al.  Learning-based view synthesis for light field cameras , 2016, ACM Trans. Graph..

[2]  William T. Freeman,et al.  What makes a good model of natural images? , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[3]  Harry Shum,et al.  Review of image-based rendering techniques , 2000, Visual Communications and Image Processing.

[4]  Richard Szeliski,et al.  High-quality video view interpolation using a layered representation , 2004, SIGGRAPH 2004.

[5]  George Drettakis,et al.  Plane-based multi-view inpainting for image-based rendering in large scenes , 2018, I3D.

[6]  Eero P. Simoncelli Statistical models for images: compression, restoration and synthesis , 1997, Conference Record of the Thirty-First Asilomar Conference on Signals, Systems and Computers (Cat. No.97CB36136).

[7]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[8]  Bo Yang,et al.  Learning 3D Scene Semantics and Structure from a Single Depth Image , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[9]  William T. Freeman,et al.  Exploiting the generic viewpoint assumption , 1996, International Journal of Computer Vision.

[10]  George Drettakis,et al.  Depth synthesis and local warps for plausible image-based navigation , 2013, TOGS.

[11]  Koray Kavukcuoglu,et al.  Neural scene representation and rendering , 2018, Science.

[12]  Vladlen Koltun,et al.  Multi-Scale Context Aggregation by Dilated Convolutions , 2015, ICLR.

[13]  George Drettakis,et al.  Multi-View Inpainting for Image-Based Scene Editing and Rendering , 2016, 2016 Fourth International Conference on 3D Vision (3DV).

[14]  Guillermo Sapiro,et al.  Image inpainting , 2000, SIGGRAPH.

[15]  Min H. Kim,et al.  Multiview Image Completion with Space Structure Propagation , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[16]  Frédo Durand,et al.  Antialiasing for automultiscopic 3D displays , 2006, EGSR '06.

[17]  Ravi Ramamoorthi,et al.  Learning to Synthesize a 4D RGBD Light Field from a Single Image , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[18]  Tom Duff,et al.  Compositing digital images , 1984, SIGGRAPH.

[19]  M. Levoy,et al.  Fast volume rendering using a shear-warp factorization of the viewing transformation , 1994, SIGGRAPH.

[20]  Qionghai Dai,et al.  Light field from micro-baseline image pair , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[21]  Thomas S. Huang,et al.  Generative Image Inpainting with Contextual Attention , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[22]  Jitendra Malik,et al.  Modeling and Rendering Architecture from Photographs: A hybrid geometry- and image-based approach , 1996, SIGGRAPH.

[23]  Ersin Yumer,et al.  Transformation-Grounded Image Generation Network for Novel 3D View Synthesis , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[24]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[25]  Christine Guillemot,et al.  Image Inpainting : Overview and Recent Advances , 2014, IEEE Signal Processing Magazine.

[26]  Nassir Navab,et al.  Peeking Behind Objects: Layered Depth Prediction from a Single Image , 2018, Pattern Recognit. Lett..

[27]  Noah Snavely,et al.  Layer-structured 3D Scene Inference via View Synthesis , 2018, ECCV.

[28]  Marc Levoy,et al.  Display of surfaces from volume data , 1988, IEEE Computer Graphics and Applications.

[29]  Simon J. Julier,et al.  Structured Prediction of Unobserved Voxels from a Single Depth Image , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[30]  Eero P. Simoncelli,et al.  Image quality assessment: from error visibility to structural similarity , 2004, IEEE Transactions on Image Processing.

[31]  Richard Szeliski,et al.  The lumigraph , 1996, SIGGRAPH.

[32]  Ren Ng Fourier Slice Photography , 2005 .

[33]  Yuichi Yoshida,et al.  Spectral Normalization for Generative Adversarial Networks , 2018, ICLR.

[34]  P. Cochat,et al.  Et al , 2008, Archives de pediatrie : organe officiel de la Societe francaise de pediatrie.

[35]  Marc Levoy,et al.  Frequency domain volume rendering , 1993, SIGGRAPH.

[36]  Marc Levoy,et al.  Light field rendering , 1996, SIGGRAPH.

[37]  John Flynn,et al.  Deep Stereo: Learning to Predict New Views from the World's Imagery , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[38]  Jan-Michael Frahm,et al.  Deep blending for free-viewpoint image-based rendering , 2018, ACM Trans. Graph..

[39]  Jitendra Malik,et al.  View Synthesis by Appearance Flow , 2016, ECCV.

[40]  Li Zhang,et al.  Soft 3D reconstruction for view synthesis , 2017, ACM Trans. Graph..

[41]  Scott Cohen,et al.  Depth-based patch scaling for content-aware stereo image completion , 2014, IEEE Winter Conference on Applications of Computer Vision.

[42]  Vladlen Koltun,et al.  Photographic Image Synthesis with Cascaded Refinement Networks , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[43]  John Flynn,et al.  Stereo magnification , 2018, ACM Trans. Graph..

[44]  Thomas A. Funkhouser,et al.  Semantic Scene Completion from a Single Depth Image , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).