Novel View Synthesis for Large-scale Scene using Adversarial Loss

Novel view synthesis aims to synthesize new images from different viewpoints of given images. Most of previous works focus on generating novel views of certain objects with a fixed background. However, for some applications, such as virtual reality or robotic manipulations, large changes in background may occur due to the egomotion of the camera. Generated images of a large-scale environment from novel views may be distorted if the structure of the environment is not considered. In this work, we propose a novel fully convolutional network, that can take advantage of the structural information explicitly by incorporating the inverse depth features. The inverse depth features are obtained from CNNs trained with sparse labeled depth values. This framework can easily fuse multiple images from different viewpoints. To fill the missing textures in the generated image, adversarial loss is applied, which can also improve the overall image quality. Our method is evaluated on the KITTI dataset. The results show that our method can generate novel views of large-scale scene without distortion. The effectiveness of our approach is demonstrated through qualitative and quantitative evaluation.

[1]  Jitendra Malik,et al.  View Synthesis by Appearance Flow , 2016, ECCV.

[2]  Geoffrey E. Hinton,et al.  Unsupervised Learning of Image Transformations , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[3]  Kilian Q. Weinberger,et al.  Densely Connected Convolutional Networks , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[4]  Allan D. Jepson,et al.  Subspace methods for recovering rigid motion I: Algorithm and implementation , 2004, International Journal of Computer Vision.

[5]  John Flynn,et al.  Deep Stereo: Learning to Predict New Views from the World's Imagery , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[6]  Simon Osindero,et al.  Conditional Generative Adversarial Nets , 2014, ArXiv.

[7]  Shuai Li,et al.  Depth Image Based View Synthesis: New Insights and Perspectives on Hole Generation and Filling , 2016, IEEE Transactions on Broadcasting.

[8]  Yoshua Bengio,et al.  Generative Adversarial Networks , 2014, ArXiv.

[9]  Thomas Brox,et al.  Learning to generate chairs with convolutional neural networks , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[10]  Alexei A. Efros,et al.  Image-to-Image Translation with Conditional Adversarial Networks , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[11]  Raymond Y. K. Lau,et al.  Least Squares Generative Adversarial Networks , 2016, 2017 IEEE International Conference on Computer Vision (ICCV).

[12]  Sergey Levine,et al.  Unsupervised Learning for Physical Interaction through Video Prediction , 2016, NIPS.

[13]  Ian D. Reid,et al.  Learning Depth from Single Monocular Images Using Deep Convolutional Neural Fields , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[14]  Andrew Zisserman,et al.  Spatial Transformer Networks , 2015, NIPS.

[15]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[16]  Andreas Geiger,et al.  Are we ready for autonomous driving? The KITTI vision benchmark suite , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[17]  Jonathon Shlens,et al.  Conditional Image Synthesis with Auxiliary Classifier GANs , 2016, ICML.

[18]  Rob Fergus,et al.  Depth Map Prediction from a Single Image using a Multi-Scale Deep Network , 2014, NIPS.

[19]  Anelia Angelova,et al.  Geometry-based next frame prediction from monocular video , 2017, 2017 IEEE Intelligent Vehicles Symposium (IV).

[20]  Thomas Brox,et al.  Multi-view 3D Models from Single Images with a Convolutional Network , 2015, ECCV.