论文信息 - Pushing the Boundaries of View Extrapolation With Multiplane Images

Pushing the Boundaries of View Extrapolation With Multiplane Images

We explore the problem of view synthesis from a narrow baseline pair of images, and focus on generating high-quality view extrapolations with plausible disocclusions. Our method builds upon prior work in predicting a multiplane image (MPI), which represents scene content as a set of RGBA planes within a reference view frustum and renders novel views by projecting this content into the target viewpoints. We present a theoretical analysis showing how the range of views that can be rendered from an MPI increases linearly with the MPI disparity sampling frequency, as well as a novel MPI prediction procedure that theoretically enables view extrapolations of up to 4 times the lateral viewpoint movement allowed by prior work. Our method ameliorates two specific issues that limit the range of views renderable by prior methods: 1) We expand the range of novel views that can be rendered without depth discretization artifacts by using a 3D convolutional network architecture along with a randomized-resolution training procedure to allow our model to predict MPIs with increased disparity sampling frequency. 2) We reduce the repeated texture artifacts seen in disocclusions by enforcing a constraint that the appearance of hidden content at any depth must be drawn from visible content at or behind that depth.

[1] Ting-Chun Wang,et al. Learning-based view synthesis for light field cameras , 2016, ACM Trans. Graph..

[2] William T. Freeman,et al. What makes a good model of natural images? , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[3] Harry Shum,et al. Review of image-based rendering techniques , 2000, Visual Communications and Image Processing.

[4] Richard Szeliski,et al. High-quality video view interpolation using a layered representation , 2004, SIGGRAPH 2004.

[5] George Drettakis,et al. Plane-based multi-view inpainting for image-based rendering in large scenes , 2018, I3D.

[6] Eero P. Simoncelli. Statistical models for images: compression, restoration and synthesis , 1997, Conference Record of the Thirty-First Asilomar Conference on Signals, Systems and Computers (Cat. No.97CB36136).

[7] Andrew Zisserman,et al. Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[8] Bo Yang,et al. Learning 3D Scene Semantics and Structure from a Single Depth Image , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[9] William T. Freeman,et al. Exploiting the generic viewpoint assumption , 1996, International Journal of Computer Vision.

[10] George Drettakis,et al. Depth synthesis and local warps for plausible image-based navigation , 2013, TOGS.

[11] Koray Kavukcuoglu,et al. Neural scene representation and rendering , 2018, Science.

[12] Vladlen Koltun,et al. Multi-Scale Context Aggregation by Dilated Convolutions , 2015, ICLR.

[13] George Drettakis,et al. Multi-View Inpainting for Image-Based Scene Editing and Rendering , 2016, 2016 Fourth International Conference on 3D Vision (3DV).

[14] Guillermo Sapiro,et al. Image inpainting , 2000, SIGGRAPH.

[15] Min H. Kim,et al. Multiview Image Completion with Space Structure Propagation , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[16] Frédo Durand,et al. Antialiasing for automultiscopic 3D displays , 2006, EGSR '06.

[17] Ravi Ramamoorthi,et al. Learning to Synthesize a 4D RGBD Light Field from a Single Image , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[18] Tom Duff,et al. Compositing digital images , 1984, SIGGRAPH.

[19] M. Levoy,et al. Fast volume rendering using a shear-warp factorization of the viewing transformation , 1994, SIGGRAPH.

[20] Qionghai Dai,et al. Light field from micro-baseline image pair , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[21] Thomas S. Huang,et al. Generative Image Inpainting with Contextual Attention , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[22] Jitendra Malik,et al. Modeling and Rendering Architecture from Photographs: A hybrid geometry- and image-based approach , 1996, SIGGRAPH.

[23] Ersin Yumer,et al. Transformation-Grounded Image Generation Network for Novel 3D View Synthesis , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[24] Jimmy Ba,et al. Adam: A Method for Stochastic Optimization , 2014, ICLR.

[25] Christine Guillemot,et al. Image Inpainting : Overview and Recent Advances , 2014, IEEE Signal Processing Magazine.

[26] Nassir Navab,et al. Peeking Behind Objects: Layered Depth Prediction from a Single Image , 2018, Pattern Recognit. Lett..

[27] Noah Snavely,et al. Layer-structured 3D Scene Inference via View Synthesis , 2018, ECCV.

[28] Marc Levoy,et al. Display of surfaces from volume data , 1988, IEEE Computer Graphics and Applications.

[29] Simon J. Julier,et al. Structured Prediction of Unobserved Voxels from a Single Depth Image , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[30] Eero P. Simoncelli,et al. Image quality assessment: from error visibility to structural similarity , 2004, IEEE Transactions on Image Processing.

[31] Richard Szeliski,et al. The lumigraph , 1996, SIGGRAPH.

[32] Ren Ng. Fourier Slice Photography , 2005 .

[33] Yuichi Yoshida,et al. Spectral Normalization for Generative Adversarial Networks , 2018, ICLR.

[34] P. Cochat,et al. Et al , 2008, Archives de pediatrie : organe officiel de la Societe francaise de pediatrie.

[35] Marc Levoy,et al. Frequency domain volume rendering , 1993, SIGGRAPH.

[36] Marc Levoy,et al. Light field rendering , 1996, SIGGRAPH.

[37] John Flynn,et al. Deep Stereo: Learning to Predict New Views from the World's Imagery , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[38] Jan-Michael Frahm,et al. Deep blending for free-viewpoint image-based rendering , 2018, ACM Trans. Graph..

[39] Jitendra Malik,et al. View Synthesis by Appearance Flow , 2016, ECCV.

[40] Li Zhang,et al. Soft 3D reconstruction for view synthesis , 2017, ACM Trans. Graph..

[41] Scott Cohen,et al. Depth-based patch scaling for content-aware stereo image completion , 2014, IEEE Winter Conference on Applications of Computer Vision.

[42] Vladlen Koltun,et al. Photographic Image Synthesis with Cascaded Refinement Networks , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[43] John Flynn,et al. Stereo magnification , 2018, ACM Trans. Graph..

[44] Thomas A. Funkhouser,et al. Semantic Scene Completion from a Single Depth Image , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).