Novel View Synthesis from a Single Image via Unsupervised learning

View synthesis aims to generate novel views from one or more given source views. Although existing methods have achieved promising performance, they usually require paired views of different poses to learn a pixel transformation. This paper proposes an unsupervised network to learn such a pixel transformation from a single source viewpoint. In particular, the network consists of a token transformation module (TTM) that facilities the transformation of the features extracted from a source viewpoint image into an intrinsic representation with respect to a pre-defined reference pose and a view generation module (VGM) that synthesizes an arbitrary view from the representation. The learned transformation allows us to synthesize a novel view from any single source viewpoint image of unknown pose. Experiments on the widely used view synthesis datasets have demonstrated that the proposed network is able to produce comparable results to the state-of-the-art methods despite the fact that learning is unsupervised and only a single source viewpoint image is required for generating a novel view. The code will be available soon.

[1]  Santanu Chaudhury,et al.  A Novel Hybrid Kinect-Variety-Based High-Quality Multiview Rendering Scheme for Glass-Free 3D Displays , 2017, IEEE Transactions on Circuits and Systems for Video Technology.

[2]  Simon Lucey,et al.  SDF-SRN: Learning Signed Distance 3D Object Reconstruction from Static Images , 2020, NeurIPS.

[3]  Shahram Shirani,et al.  An Adaptive Patch-Based Reconstruction Scheme for View Synthesis by Disparity Estimation Using Optical Flow , 2018, IEEE Transactions on Circuits and Systems for Video Technology.

[4]  Ersin Yumer,et al.  Transformation-Grounded Image Generation Network for Novel 3D View Synthesis , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[5]  Anders P. Eriksson,et al.  Image2Mesh: A Learning Framework for Single Image 3D Reconstruction , 2017, ACCV.

[6]  Leonidas J. Guibas,et al.  ShapeNet: An Information-Rich 3D Model Repository , 2015, ArXiv.

[7]  Vittorio Ferrari,et al.  Learning Single-Image 3D Reconstruction by Generative Modelling of Shape, Pose and Shading , 2019, International Journal of Computer Vision.

[8]  Mario Fritz,et al.  Novel Views of Objects from a Single Image , 2016, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[9]  Abhinav Gupta,et al.  Learning a Predictable and Generative Vector Representation for Objects , 2016, ECCV.

[10]  Li Sun,et al.  Novel View Synthesis on Unpaired Data by Conditional Deformable Variational Auto-Encoder , 2020, ECCV.

[11]  Thomas Brox,et al.  Single-view to Multi-view: Reconstructing Unseen Views with a Convolutional Network , 2015, ArXiv.

[12]  Beiji Zou,et al.  Fixing Defect of Photometric Loss for Self-Supervised Monocular Depth Estimation , 2022, IEEE Transactions on Circuits and Systems for Video Technology.

[13]  Sergey Tulyakov,et al.  Transformable Bottleneck Networks , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[14]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[15]  Honglak Lee,et al.  Perspective Transformer Nets: Learning Single-View 3D Object Reconstruction without 3D Supervision , 2016, NIPS.

[16]  Luca Antiga,et al.  Automatic differentiation in PyTorch , 2017 .

[17]  Ning Zhang,et al.  Multi-view to Novel View: Synthesizing Novel Views With Self-learned Confidence , 2018, ECCV.

[18]  Ali Farhadi,et al.  Deep3D: Fully Automatic 2D-to-3D Video Conversion with Deep Convolutional Neural Networks , 2016, ECCV.

[19]  Xiaofeng Liu,et al.  AUTO3D: Novel view synthesis through unsupervisely learned variational viewpoint and global 3D representation , 2020, ECCV.

[20]  Xuelong Li,et al.  Shape-Preserving Object Depth Control for Stereoscopic Images , 2018, IEEE Transactions on Circuits and Systems for Video Technology.

[21]  Xuming He,et al.  Geometry-Aware Deep Network for Single-Image Novel View Synthesis , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[22]  Koray Kavukcuoglu,et al.  Neural scene representation and rendering , 2018, Science.

[23]  Xiaoming Liu,et al.  Disentangled Representation Learning GAN for Pose-Invariant Face Recognition , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[24]  Yong-Liang Yang,et al.  RenderNet: A deep convolutional network for differentiable rendering from 3D shapes , 2018, NeurIPS.

[25]  R. Shepard,et al.  Mental Rotation of Three-Dimensional Objects , 1971, Science.

[26]  Eero P. Simoncelli,et al.  Image quality assessment: from error visibility to structural similarity , 2004, IEEE Transactions on Image Processing.

[27]  Jiaya Jia,et al.  View Independent Generative Adversarial Network for Novel View Synthesis , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[28]  Ruigang Yang,et al.  View Extrapolation of Human Body from a Single Image , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[29]  Didier Stricker,et al.  Fast View Synthesis with Deep Stereo Vision , 2018, VISIGRAPP.

[30]  Jenq-Neng Hwang,et al.  Depth Estimation Using a Self-Supervised Network Based on Cross-Layer Feature Fusion and the Quadtree Constraint , 2021, IEEE Transactions on Circuits and Systems for Video Technology.

[31]  Jan Kautz,et al.  MoCoGAN: Decomposing Motion and Content for Video Generation , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[32]  Jitendra Malik,et al.  View Synthesis by Appearance Flow , 2016, ECCV.

[33]  Jitendra Malik,et al.  Learning a Multi-View Stereo Machine , 2017, NIPS.

[34]  Abhishek Kar,et al.  Learning a MultiView Stereo Machine , 2017 .

[35]  Mayank Dave,et al.  Image-to-Image Translation Using Generative Adversarial Network , 2019, 2019 3rd International conference on Electronics, Communication and Aerospace Technology (ICECA).

[36]  Thomas Brox,et al.  Learning to generate chairs with convolutional neural networks , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[37]  Silvio Savarese,et al.  3D-R2N2: A Unified Approach for Single and Multi-view 3D Object Reconstruction , 2016, ECCV.

[38]  Noah Snavely,et al.  Layer-structured 3D Scene Inference via View Synthesis , 2018, ECCV.

[39]  Chen Kong,et al.  Learning Efficient Point Cloud Generation for Dense 3D Object Reconstruction , 2017, AAAI.

[40]  J. Tenenbaum,et al.  MarrNet : 3 D Shape Reconstruction via 2 . 5 D Sketches , 2017 .