X-Fields

We suggest to represent an X-Field ---a set of 2D images taken across different view, time or illumination conditions, i.e., video, lightfield, reflectance fields or combinations thereof---by learning a neural network (NN) to map their view, time or light coordinates to 2D images. Executing this NN at new coordinates results in joint view, time or light interpolation. The key idea to make this workable is a NN that already knows the "basic tricks" of graphics (lighting, 3D projection, occlusion) in a hard-coded and differentiable form. The NN represents the input to that rendering as an implicit map, that for any view, time, or light coordinate and for any pixel can quantify how it will move if view, time or light coordinates change (Jacobian of pixel position with respect to view, time, illumination, etc.). Our X-Field representation is trained for one scene within minutes, leading to a compact set of trainable parameters and hence real-time navigation in view, time and illumination.

[1]  Zhiyong Gao,et al.  MEMC-Net: Motion Estimation and Motion Compensation Driven Neural Network for Video Interpolation and Enhancement , 2018, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[2]  Andreas Geiger,et al.  Occupancy Flow: 4D Reconstruction by Learning Particle Dynamics , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[3]  Jonathan T. Barron,et al.  NeRF: Representing Scenes as Neural Radiance Fields for View Synthesis , 2020, ECCV.

[4]  Gordon Wetzstein,et al.  DeepVoxels: Learning Persistent 3D Feature Embeddings , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[5]  Yong-Liang Yang,et al.  RenderNet: A deep convolutional network for differentiable rendering from 3D shapes , 2018, NeurIPS.

[6]  Gordon Wetzstein,et al.  Scene Representation Networks: Continuous 3D-Structure-Aware Neural Scene Representations , 2019, NeurIPS.

[7]  Thu Nguyen-Phuoc,et al.  HoloGAN: Unsupervised Learning of 3D Representations From Natural Images , 2019, 2019 IEEE/CVF International Conference on Computer Vision Workshop (ICCVW).

[8]  Noah Snavely,et al.  Unsupervised Learning of Depth and Ego-Motion from Video , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[9]  Thomas Malzbender,et al.  Polynomial texture maps , 2001, SIGGRAPH.

[10]  Michael J. Black,et al.  A Naturalistic Open Source Movie for Optical Flow Evaluation , 2012, ECCV.

[11]  John Flynn,et al.  Deep Stereo: Learning to Predict New Views from the World's Imagery , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[12]  Karen Simonyan,et al.  Neural Audio Synthesis of Musical Notes with WaveNet Autoencoders , 2017, ICML.

[13]  Hao Zhang,et al.  Learning Implicit Fields for Generative Shape Modeling , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[14]  Jia-Bin Huang,et al.  DF-Net: Unsupervised Joint Learning of Depth and Flow using Cross-Task Consistency , 2018, ECCV.

[15]  Michael Bosse,et al.  Unstructured lumigraph rendering , 2001, SIGGRAPH.

[16]  Soumith Chintala,et al.  Unsupervised Representation Learning with Deep Convolutional Generative Adversarial Networks , 2015, ICLR.

[17]  Maneesh Agrawala,et al.  Puppet Dubbing , 2019, EGSR.

[18]  Charles R. Dyer,et al.  Interpolating view and scene motion by dynamic view morphing , 1999, Proceedings. 1999 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (Cat. No PR00149).

[19]  Jan Kautz,et al.  PWC-Net: CNNs for Optical Flow Using Pyramid, Warping, and Cost Volume , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[20]  Marc Levoy,et al.  Light field rendering , 1996, SIGGRAPH.

[21]  George Drettakis,et al.  Multi-view relighting using a geometry-aware network , 2019, ACM Trans. Graph..

[22]  Paul Debevec,et al.  DeepView: View Synthesis With Learned Gradient Descent , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[23]  Wojciech Matusik,et al.  Moving gradients: a path-based method for plausible image interpolation , 2009, ACM Trans. Graph..

[24]  Billy Chen,et al.  Light Source Interpolation for Sparsely Sampled Reflectance Fields , 2005 .

[25]  Yong-Liang Yang,et al.  HoloGAN: Unsupervised Learning of 3D Representations From Natural Images , 2019, 2019 IEEE/CVF International Conference on Computer Vision Workshop (ICCVW).

[26]  Justus Thies,et al.  Deferred Neural Rendering: Image Synthesis using Neural Textures , 2019 .

[27]  Michael Goesele,et al.  Image-based rendering in the gradient domain , 2013, ACM Trans. Graph..

[28]  Mario Fritz,et al.  Deep Appearance Maps , 2018, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[29]  George Drettakis,et al.  Scalable inside-out image-based rendering , 2016, ACM Trans. Graph..

[30]  Wojciech Matusik,et al.  Improving visual quality of view transitions in automultiscopic displays , 2014, ACM Trans. Graph..

[31]  Ning Zhang,et al.  Multi-view to Novel View: Synthesizing Novel Views With Self-learned Confidence , 2018, ECCV.

[32]  Hans-Peter Seidel,et al.  Efficient Multi‐image Correspondences for On‐line Light Field Video Processing , 2016, Comput. Graph. Forum.

[33]  Max Welling,et al.  Auto-Encoding Variational Bayes , 2013, ICLR.

[34]  Ravi Ramamoorthi,et al.  Local Light Field Fusion: Practical View Synthesis with Prescriptive Sampling Guidelines , 2019 .

[35]  Hao Li,et al.  PIFu: Pixel-Aligned Implicit Function for High-Resolution Clothed Human Digitization , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[36]  Alexei A. Efros,et al.  The Unreasonable Effectiveness of Deep Features as a Perceptual Metric , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[37]  Leonard McMillan,et al.  Post-rendering 3D warping , 1997, SI3D.

[38]  Andrew Zisserman,et al.  Spatial Transformer Networks , 2015, NIPS.

[39]  Xiaoyun Zhang,et al.  Depth-Aware Video Frame Interpolation , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[40]  Shenghua Gao,et al.  Deep Surface Light Fields , 2018, PACMCGIT.

[41]  Dikpal Reddy,et al.  Frequency-Space Decomposition and Acquisition of Light Transport under Spatially Varying Illumination , 2012, ECCV.

[42]  Thomas Brox,et al.  Learning to generate chairs with convolutional neural networks , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[43]  Paul E. Debevec,et al.  Acquiring the reflectance field of a human face , 2000, SIGGRAPH.

[44]  Timo Aila,et al.  A Style-Based Generator Architecture for Generative Adversarial Networks , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[45]  Marcus A. Magnor,et al.  Virtual video camera: image-based viewpoint navigation through space and time , 2009, SIGGRAPH '09.

[46]  Richard Szeliski,et al.  High-quality video view interpolation using a layered representation , 2004, SIGGRAPH 2004.

[47]  Pieter Peers,et al.  Compressive light transport sensing , 2009, ACM Trans. Graph..

[48]  Jan Kautz,et al.  Super SloMo: High Quality Estimation of Multiple Intermediate Frames for Video Interpolation , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[49]  Jean Ponce,et al.  Computer Vision: A Modern Approach , 2002 .

[50]  Tom White,et al.  Sampling Generative Networks: Notes on a Few Effective Techniques , 2016, ArXiv.

[51]  Frédo Durand,et al.  3DTV at home , 2017, ACM Trans. Graph..

[52]  Michael J. Black,et al.  Layered segmentation and optical flow estimation over time , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[53]  Peiran REN,et al.  Image based relighting using neural networks , 2015, ACM Trans. Graph..

[54]  Jan Kautz,et al.  Video-to-Video Synthesis , 2018, NeurIPS.

[55]  Yuting Zhang,et al.  Deep Visual Analogy-Making , 2015, NIPS.

[56]  Geoffrey E. Hinton,et al.  Reducing the Dimensionality of Data with Neural Networks , 2006, Science.

[57]  Hans-Peter Seidel,et al.  Adaptive sampling of reflectance fields , 2007, TOGS.

[58]  Oisin Mac Aodha,et al.  Unsupervised Monocular Depth Estimation with Left-Right Consistency , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[59]  George Drettakis,et al.  Depth synthesis and local warps for plausible image-based navigation , 2013, TOGS.

[60]  Richard Szeliski,et al.  The lumigraph , 1996, SIGGRAPH.

[61]  Jie Song,et al.  Monocular Neural Image Based Rendering With Continuous View Control , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[62]  Alexei A. Efros,et al.  Light field video capture using a learning-based hybrid imaging system , 2017, ACM Trans. Graph..

[63]  Huamin Wang,et al.  Towards space: time light field rendering , 2005, I3D '05.

[64]  Jason Yosinski,et al.  An Intriguing Failing of Convolutional Neural Networks and the CoordConv Solution , 2018, NeurIPS.

[65]  Neus Sabater,et al.  Dataset and Pipeline for Multi-view Light-Field Video , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[66]  Graham Fyffe,et al.  Stereo Magnification: Learning View Synthesis using Multiplane Images , 2018, ArXiv.

[67]  Andreas Geiger,et al.  Texture Fields: Learning Texture Representations in Function Space , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[68]  Kalyan Sunkavalli,et al.  Deep image-based relighting from optimal sparse samples , 2018, ACM Trans. Graph..

[69]  Kalyan Sunkavalli,et al.  Deep view synthesis from sparse photometric images , 2019, ACM Trans. Graph..

[70]  Gordon Wetzstein,et al.  State of the Art on Neural Rendering , 2020, Comput. Graph. Forum.

[71]  Jan-Michael Frahm,et al.  Deep blending for free-viewpoint image-based rendering , 2018, ACM Trans. Graph..

[72]  Jitendra Malik,et al.  View Synthesis by Appearance Flow , 2016, ECCV.

[73]  Li Zhang,et al.  Soft 3D reconstruction for view synthesis , 2017, ACM Trans. Graph..

[74]  Ting-Chun Wang,et al.  Learning-based view synthesis for light field cameras , 2016, ACM Trans. Graph..

[75]  Yoshua Bengio,et al.  Generative Adversarial Nets , 2014, NIPS.

[76]  Marc Levoy,et al.  High performance imaging using large camera arrays , 2005, ACM Trans. Graph..