论文信息 - Neural Scene Flow Fields for Space-Time View Synthesis of Dynamic Scenes

Neural Scene Flow Fields for Space-Time View Synthesis of Dynamic Scenes

We present a method to perform novel view and time synthesis of dynamic scenes, requiring only a monocular video with known camera poses as input. To do this, we introduce Neural Scene Flow Fields, a new representation that models the dynamic scene as a time-variant continuous function of appearance, geometry, and 3D scene motion. Our representation is optimized through a neural network to fit the observed input views. We show that our representation can be used for complex dynamic scenes, including thin structures, view-dependent effects, and natural degrees of motion. We conduct a number of experiments that demonstrate our approach significantly outperforms recent monocular view synthesis methods, and show qualitative results of space-time view synthesis on a variety of real-world videos.

[1] Robert Bregovic,et al. Light Field Reconstruction Using Shearlet Transform , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[2] Karol Myszkowski,et al. X-Fields , 2020, ACM Trans. Graph..

[3] Feng Liu,et al. Softmax Splatting for Video Frame Interpolation , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[4] Kalyan Sunkavalli,et al. Deep view synthesis from sparse photometric images , 2019, ACM Trans. Graph..

[5] William T. Freeman,et al. Learning the Depths of Moving People by Watching Frozen People , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[6] Jan Kautz,et al. Extreme View Synthesis , 2018, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[7] Feng Liu,et al. Video Frame Interpolation via Adaptive Convolution , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[8] Rui Yu,et al. Video Pop-up: Monocular 3D Reconstruction of Dynamic Scenes , 2014, ECCV.

[9] Noah Snavely,et al. Neural Rerendering in the Wild , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[10] Ross B. Girshick,et al. Mask R-CNN , 2017, 1703.06870.

[11] Yu Ji,et al. A Neural Rendering Framework for Free-Viewpoint Relighting , 2019, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[12] Pushmeet Kohli,et al. Fusion4D , 2016, ACM Trans. Graph..

[13] Feng Liu,et al. 3D Ken Burns effect from a single image , 2019, ACM Trans. Graph..

[14] Gernot Riegler,et al. Free View Synthesis , 2020, ECCV.

[15] Jan-Michael Frahm,et al. Deep blending for free-viewpoint image-based rendering , 2018, ACM Trans. Graph..

[16] Noah Snavely,et al. Single-View View Synthesis With Multiplane Images , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[17] Li Zhang,et al. Soft 3D reconstruction for view synthesis , 2017, ACM Trans. Graph..

[18] Jan-Michael Frahm,et al. Sparse Dynamic 3D Reconstruction from Unsynchronized Videos , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[19] Rudolf Mester,et al. Mono-SF: Multi-View Geometry Meets Single-View Depth for Monocular Scene Flow Estimation of Dynamic Traffic Scenes , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[20] Jonathan T. Barron,et al. Pushing the Boundaries of View Extrapolation With Multiplane Images , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[21] Yaser Sheikh,et al. 4D Visualization of Dynamic Events From Unconstrained Multi-View Videos , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[22] Dieter Fox,et al. DynamicFusion: Reconstruction and tracking of non-rigid scenes in real-time , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[23] George Drettakis,et al. Depth synthesis and local warps for plausible image-based navigation , 2013, TOGS.

[24] Deqing Sun,et al. Layered RGBD scene flow estimation , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[25] James M. Rehg,et al. Learning Rigidity in Dynamic Scenes with a Moving Camera for 3D Motion Field Estimation , 2018, ECCV.

[26] Matthias Nießner,et al. DeepDeform: Learning Non-Rigid RGB-D Reconstruction With Semi-Supervised Data , 2019, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[27] Ruigang Yang,et al. Real-Time Simultaneous Pose and Shape Estimation for Articulated Objects Using a Single Depth Camera , 2016, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[28] Jonathan T. Barron,et al. NeRF in the Wild: Neural Radiance Fields for Unconstrained Photo Collections , 2020, ArXiv.

[29] Alexei A. Efros,et al. The Unreasonable Effectiveness of Deep Features as a Perceptual Metric , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[30] Yaser Sheikh,et al. 3D Reconstruction of a Moving Point from a Series of 2D Projections , 2010, ECCV.

[31] Yaser Sheikh,et al. Spatiotemporal Bundle Adjustment for Dynamic 3D Reconstruction , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[32] Thomas Brox,et al. FlowNet 2.0: Evolution of Optical Flow Estimation with Deep Networks , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[33] Steven M. Seitz,et al. Photo tourism: exploring photo collections in 3D , 2006, ACM Trans. Graph..

[34] Ravi Ramamoorthi,et al. Local Light Field Fusion: Practical View Synthesis with Prescriptive Sampling Guidelines , 2019 .

[35] David Salesin,et al. Layered neural rendering for retiming people in video , 2020, ACM Trans. Graph..

[36] Marcus A. Magnor,et al. View and Time Interpolation in Image Space , 2008, Comput. Graph. Forum.

[37] Graham Fyffe,et al. Stereo Magnification: Learning View Synthesis using Multiplane Images , 2018, ArXiv.

[38] Stefan Roth,et al. Self-Supervised Monocular Scene Flow Estimation , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[39] Jimmy Ba,et al. Adam: A Method for Stochastic Optimization , 2014, ICLR.

[40] Xiaoyun Zhang,et al. Depth-Aware Video Frame Interpolation , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[41] Vladlen Koltun,et al. Dense Monocular Depth Estimation in Complex Dynamic Scenes , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[42] Stefan Roth,et al. UnFlow: Unsupervised Learning of Optical Flow with a Bidirectional Census Loss , 2017, AAAI.

[43] Jonathan T. Barron,et al. NeRF: Representing Scenes as Neural Radiance Fields for View Synthesis , 2020, ECCV.

[44] Gordon Wetzstein,et al. Scene Representation Networks: Continuous 3D-Structure-Aware Neural Scene Representations , 2019, NeurIPS.

[45] Michael Bosse,et al. Unstructured lumigraph rendering , 2001, SIGGRAPH.

[46] Kalyan Sunkavalli,et al. Deep 3D Capture: Geometry and Reflectance From Sparse Multi-View Images , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[47] Richard Szeliski,et al. High-quality video view interpolation using a layered representation , 2004, SIGGRAPH 2004.

[48] Konrad Schindler,et al. Towards Robust Monocular Depth Estimation: Mixing Datasets for Zero-Shot Cross-Dataset Transfer , 2019, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[49] Jan Kautz,et al. Super SloMo: High Quality Estimation of Multiple Intermediate Frames for Video Interpolation , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[50] Richard Szeliski,et al. Consistent video depth estimation , 2020, ACM Trans. Graph..

[51] Paul Debevec,et al. DeepView: View Synthesis With Learned Gradient Descent , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[52] Justus Thies,et al. Deferred Neural Rendering: Image Synthesis using Neural Textures , 2019 .

[53] John Flynn,et al. Deep Stereo: Learning to Predict New Views from the World's Imagery , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[54] Jan Kautz,et al. SENSE: A Shared Encoder Network for Scene-Flow Estimation , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[55] Richard Szeliski,et al. The lumigraph , 1996, SIGGRAPH.

[56] Frédo Durand,et al. Light Field Reconstruction Using Sparsity in the Continuous Fourier Domain , 2014, ACM Trans. Graph..

[57] Simon Lucey,et al. General trajectory prior for Non-Rigid reconstruction , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[58] Feng Liu,et al. Context-Aware Synthesis for Video Frame Interpolation , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[59] Frédo Durand,et al. Unstructured Light Fields , 2012, Comput. Graph. Forum.

[60] Zhengqi Li,et al. Crowdsampling the Plenoptic Function , 2020, ECCV.

[61] Paul Debevec,et al. A Low Cost Multi-Camera Array for Panoramic Light Field Video Capture , 2019, SIGGRAPH Asia Posters.

[62] Oliver Wang,et al. Revisiting Adaptive Convolutions for Video Frame Interpolation , 2020, 2021 IEEE Winter Conference on Applications of Computer Vision (WACV).

[63] Andrew W. Fitzgibbon,et al. Real-time non-rigid reconstruction using an RGB-D camera , 2014, ACM Trans. Graph..

[64] Matthias Nießner,et al. VolumeDeform: Real-Time Volumetric Non-rigid Reconstruction , 2016, ECCV.

[65] Jan Kautz,et al. Novel View Synthesis of Dynamic Scenes With Globally Coherent Depths From a Monocular Camera , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[66] Hongdong Li,et al. Monocular Dense 3D Reconstruction of a Complex Dynamic Scene from Two Perspective Frames , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[67] Marc Levoy,et al. Light field rendering , 1996, SIGGRAPH.

[68] Koray Kavukcuoglu,et al. Neural scene representation and rendering , 2018, Science.

[69] Richard Szeliski,et al. SynSin: End-to-End View Synthesis From a Single Image , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[70] Brian Okorn,et al. Just Go With the Flow: Self-Supervised Scene Flow Estimation , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[71] Gordon Wetzstein,et al. DeepVoxels: Learning Persistent 3D Feature Embeddings , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[72] Paul Debevec,et al. Immersive light field video with a layered mesh representation , 2020, ACM Trans. Graph..

[73] Harry Shum,et al. Plenoptic sampling , 2000, SIGGRAPH.

[74] Yannick Hold-Geoffroy,et al. Deep Reflectance Volumes: Relightable Reconstructions from Multi-View Photometric Images , 2020, ECCV.

[75] Feng Liu,et al. Video Frame Interpolation via Adaptive Separable Convolution , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[76] Yaser Sheikh,et al. Kronecker-Markov Prior for Dynamic 3D Reconstruction , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[77] Jia-Bin Huang,et al. 3D Photography Using Context-Aware Layered Depth Inpainting , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[78] Jitendra Malik,et al. Modeling and Rendering Architecture from Photographs: A hybrid geometry- and image-based approach , 1996, SIGGRAPH.