Neural View-Interpolation for Sparse LightField Video

We suggest representing light field (LF) videos as "one-off" neural networks (NN), i.e., a learned mapping from view-plus-time coordinates to high-resolution color values, trained on sparse views. Initially, this sounds like a bad idea for three main reasons: First, a NN LF will likely have less quality than a same-sized pixel basis representation. Second, only few training data, e.g., 9 exemplars per frame are available for sparse LF videos. Third, there is no generalization across LFs, but across view and time instead. Consequently, a network needs to be trained for each LF video. Surprisingly, these problems can turn into substantial advantages: Other than the linear pixel basis, a NN has to come up with a compact, non-linear i.e., more intelligent, explanation of color, conditioned on the sparse view and time coordinates. As observed for many NN however, this representation now is interpolatable: if the image output for sparse view coordinates is plausible, it is for all intermediate, continuous coordinates as well. Our specific network architecture involves a differentiable occlusion-aware warping step, which leads to a compact set of trainable parameters and consequently fast learning and fast execution.

[1]  Jitendra Malik,et al.  View Synthesis by Appearance Flow , 2016, ECCV.

[2]  Karen Simonyan,et al.  Neural Audio Synthesis of Musical Notes with WaveNet Autoencoders , 2017, ICML.

[3]  Li Zhang,et al.  Soft 3D reconstruction for view synthesis , 2017, ACM Trans. Graph..

[4]  John Flynn,et al.  Stereo magnification , 2018, ACM Trans. Graph..

[5]  Hao Li,et al.  Soft Rasterizer: A Differentiable Renderer for Image-Based 3D Reasoning , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[6]  Daniel Cohen-Or,et al.  Pix2Vex: Image-to-Geometry Reconstruction using a Smooth Differentiable Renderer , 2019, ArXiv.

[7]  Gordon Wetzstein,et al.  DeepVoxels: Learning Persistent 3D Feature Embeddings , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[8]  George Drettakis,et al.  Scalable inside-out image-based rendering , 2016, ACM Trans. Graph..

[9]  Andrew Zisserman,et al.  Spatial Transformer Networks , 2015, NIPS.

[10]  Kalyan Sunkavalli,et al.  Deep view synthesis from sparse photometric images , 2019, ACM Trans. Graph..

[11]  Kenny Mitchell,et al.  Iterative Image Warping , 2012, Comput. Graph. Forum.

[12]  Jason Yosinski,et al.  An Intriguing Failing of Convolutional Neural Networks and the CoordConv Solution , 2018, NeurIPS.

[13]  Yaser Sheikh,et al.  Neural volumes , 2019, ACM Trans. Graph..

[14]  Neus Sabater,et al.  Dataset and Pipeline for Multi-view Light-Field Video , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[15]  Tom White,et al.  Sampling Generative Networks: Notes on a Few Effective Techniques , 2016, ArXiv.

[16]  Frédo Durand,et al.  3DTV at home , 2017, ACM Trans. Graph..

[17]  Max Welling,et al.  Auto-Encoding Variational Bayes , 2013, ICLR.

[18]  Yong-Liang Yang,et al.  RenderNet: A deep convolutional network for differentiable rendering from 3D shapes , 2018, NeurIPS.

[19]  Jason Lawrence,et al.  Amortized supersampling , 2009, ACM Trans. Graph..

[20]  G. Rainer,et al.  Neural BTF Compression and Interpolation , 2019, Comput. Graph. Forum.

[21]  Hans-Peter Seidel,et al.  Adaptive Image-space Stereo View Synthesis , 2010, VMV.

[22]  Hans-Peter Seidel,et al.  Minimal Warping: Planning Incremental Novel‐view Synthesis , 2017, Comput. Graph. Forum.

[23]  Antonio Torralba,et al.  Generating Videos with Scene Dynamics , 2016, NIPS.

[24]  Maneesh Agrawala,et al.  Puppet Dubbing , 2019, EGSR.

[25]  Yuting Zhang,et al.  Deep Visual Analogy-Making , 2015, NIPS.

[26]  John Flynn,et al.  Deep Stereo: Learning to Predict New Views from the World's Imagery , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[27]  Geoffrey E. Hinton,et al.  Reducing the Dimensionality of Data with Neural Networks , 2006, Science.

[28]  Hans-Peter Seidel,et al.  Adaptive sampling of reflectance fields , 2007, TOGS.

[29]  Oisin Mac Aodha,et al.  Unsupervised Monocular Depth Estimation with Left-Right Consistency , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[30]  Marc Levoy,et al.  Light field rendering , 1996, SIGGRAPH.

[31]  Richard Szeliski,et al.  The lumigraph , 1996, SIGGRAPH.

[32]  Sven Wanner,et al.  Variational Light Field Analysis for Disparity Estimation and Super-Resolution , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[33]  Ting-Chun Wang,et al.  Learning-based view synthesis for light field cameras , 2016, ACM Trans. Graph..

[34]  Yoshua Bengio,et al.  Generative Adversarial Nets , 2014, NIPS.

[35]  Peiran REN,et al.  Image based relighting using neural networks , 2015, ACM Trans. Graph..

[36]  Jan Kautz,et al.  Video-to-Video Synthesis , 2018, NeurIPS.

[37]  George Drettakis,et al.  Depth synthesis and local warps for plausible image-based navigation , 2013, TOGS.

[38]  Soumith Chintala,et al.  Unsupervised Representation Learning with Deep Convolutional Generative Adversarial Networks , 2015, ICLR.

[39]  Mario Fritz,et al.  Deep Appearance Maps , 2018, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[40]  Jan Kautz,et al.  PWC-Net: CNNs for Optical Flow Using Pyramid, Warping, and Cost Volume , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[41]  Xiaoou Tang,et al.  Video Frame Synthesis Using Deep Voxel Flow , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[42]  Paul Debevec,et al.  DeepView: View Synthesis With Learned Gradient Descent , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[43]  P. Cochat,et al.  Et al , 2008, Archives de pediatrie : organe officiel de la Societe francaise de pediatrie.

[44]  Ravi Ramamoorthi,et al.  Learning to Synthesize a 4D RGBD Light Field from a Single Image , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[45]  Wojciech Matusik,et al.  Improving visual quality of view transitions in automultiscopic displays , 2014, ACM Trans. Graph..

[46]  Hans-Peter Seidel,et al.  Efficient Multi‐image Correspondences for On‐line Light Field Video Processing , 2016, Comput. Graph. Forum.

[47]  Michael Bosse,et al.  Unstructured lumigraph rendering , 2001, SIGGRAPH.

[48]  Thomas Brox,et al.  Learning to generate chairs with convolutional neural networks , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[49]  Ravi Ramamoorthi,et al.  Local light field fusion , 2019, ACM Trans. Graph..