论文信息 - Neural Scene Flow Fields for Space-Time View Synthesis of Dynamic Scenes

Neural Scene Flow Fields for Space-Time View Synthesis of Dynamic Scenes

We present a method to perform novel view and time synthesis of dynamic scenes, requiring only a monocular video with known camera poses as input. To do this, we introduce Neural Scene Flow Fields, a new representation that models the dynamic scene as a time-variant continuous function of appearance, geometry, and 3D scene motion. Our representation is optimized through a neural network to fit the observed input views. We show that our representation can be used for varieties of in-the-wild scenes, including thin structures, view-dependent effects, and complex degrees of motion. We conduct a number of experiments that demonstrate our approach significantly outperforms recent monocular view synthesis methods, and show qualitative results of space-time view synthesis on a variety of real-world videos.

Noah Snavely | Oliver Wang | Simon Niklaus | Zhengqi Li

[1] Robert Bregovic,et al. Light Field Reconstruction Using Shearlet Transform , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[2] Harry Shum,et al. Plenoptic sampling , 2000, SIGGRAPH.

[3] M. Zollhöfer,et al. DeepDeform: Learning Non-Rigid RGB-D Reconstruction With Semi-Supervised Data , 2019, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[4] Marcus A. Magnor,et al. View and Time Interpolation in Image Space , 2008, Comput. Graph. Forum.

[5] Jia-Bin Huang,et al. 3D Photography Using Context-Aware Layered Depth Inpainting , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[6] Zhengqi Li,et al. Crowdsampling the Plenoptic Function , 2020, ECCV.

[7] Kiriakos N. Kutulakos,et al. A Neural Rendering Framework for Free-Viewpoint Relighting , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[8] Jonathan T. Barron,et al. Pushing the Boundaries of View Extrapolation With Multiplane Images , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[9] Yaser Sheikh,et al. 4D Visualization of Dynamic Events From Unconstrained Multi-View Videos , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[10] Jan Kautz,et al. SENSE: A Shared Encoder Network for Scene-Flow Estimation , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[11] Gordon Wetzstein,et al. DeepVoxels: Learning Persistent 3D Feature Embeddings , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[12] James M. Rehg,et al. Learning Rigidity in Dynamic Scenes with a Moving Camera for 3D Motion Field Estimation , 2018, ECCV.

[13] Brian Okorn,et al. Just Go With the Flow: Self-Supervised Scene Flow Estimation , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[14] Karol Myszkowski,et al. X-Fields , 2020, ACM Trans. Graph..

[15] Paul Debevec,et al. Immersive light field video with a layered mesh representation , 2020, ACM Trans. Graph..

[16] Koray Kavukcuoglu,et al. Neural scene representation and rendering , 2018, Science.

[17] Jan Kautz,et al. Extreme View Synthesis , 2018, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[18] Frédo Durand,et al. Light Field Reconstruction Using Sparsity in the Continuous Fourier Domain , 2014, ACM Trans. Graph..

[19] Ravi Ramamoorthi,et al. Local Light Field Fusion: Practical View Synthesis with Prescriptive Sampling Guidelines , 2019 .

[20] Jan Kautz,et al. Novel View Synthesis of Dynamic Scenes With Globally Coherent Depths From a Monocular Camera , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[21] Jimmy Ba,et al. Adam: A Method for Stochastic Optimization , 2014, ICLR.

[22] George Drettakis,et al. Depth synthesis and local warps for plausible image-based navigation , 2013, TOGS.

[23] Paul Debevec,et al. A Low Cost Multi-Camera Array for Panoramic Light Field Video Capture , 2019, SIGGRAPH Asia Posters.

[24] Rui Yu,et al. Video Pop-up: Monocular 3D Reconstruction of Dynamic Scenes , 2014, ECCV.

[25] Jia Deng,et al. RAFT: Recurrent All-Pairs Field Transforms for Optical Flow , 2020, ECCV.

[26] Yannick Hold-Geoffroy,et al. Deep Reflectance Volumes: Relightable Reconstructions from Multi-View Photometric Images , 2020, ECCV.

[27] Kalyan Sunkavalli,et al. Deep view synthesis from sparse photometric images , 2019, ACM Trans. Graph..

[28] William T. Freeman,et al. Learning the Depths of Moving People by Watching Frozen People , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[29] Richard Szeliski,et al. SynSin: End-to-End View Synthesis From a Single Image , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[30] Yaser Sheikh,et al. Spatiotemporal Bundle Adjustment for Dynamic 3D Reconstruction , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[31] John Flynn,et al. Deep Stereo: Learning to Predict New Views from the World's Imagery , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[32] Steven M. Seitz,et al. Photo tourism: exploring photo collections in 3D , 2006, ACM Trans. Graph..

[33] Paul Debevec,et al. DeepView: View Synthesis With Learned Gradient Descent , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[34] Gordon Wetzstein,et al. Scene Representation Networks: Continuous 3D-Structure-Aware Neural Scene Representations , 2019, NeurIPS.

[35] Richard Szeliski,et al. High-quality video view interpolation using a layered representation , 2004, SIGGRAPH 2004.

[36] Deqing Sun,et al. Layered RGBD scene flow estimation , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[37] Gernot Riegler,et al. Free View Synthesis , 2020, ECCV.

[38] Dieter Fox,et al. DynamicFusion: Reconstruction and tracking of non-rigid scenes in real-time , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[39] Feng Liu,et al. Context-Aware Synthesis for Video Frame Interpolation , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[40] Matthias Nießner,et al. NRMVS: Non-Rigid Multi-View Stereo , 2019, 2020 IEEE Winter Conference on Applications of Computer Vision (WACV).

[41] Marc Levoy,et al. Light field rendering , 1996, SIGGRAPH.

[42] Richard Szeliski,et al. Consistent video depth estimation , 2020, ACM Trans. Graph..

[43] Oliver Wang,et al. Revisiting Adaptive Convolutions for Video Frame Interpolation , 2020, 2021 IEEE Winter Conference on Applications of Computer Vision (WACV).

[44] Richard Szeliski,et al. The lumigraph , 1996, SIGGRAPH.

[45] Konrad Schindler,et al. Towards Robust Monocular Depth Estimation: Mixing Datasets for Zero-Shot Cross-Dataset Transfer , 2019, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[46] Alexei A. Efros,et al. The Unreasonable Effectiveness of Deep Features as a Perceptual Metric , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[47] Stefan Roth,et al. Self-Supervised Monocular Scene Flow Estimation , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[48] David Salesin,et al. Layered neural rendering for retiming people in video , 2020, ACM Trans. Graph..

[49] Yaser Sheikh,et al. Kronecker-Markov Prior for Dynamic 3D Reconstruction , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[50] Ross B. Girshick,et al. Mask R-CNN , 2017, 1703.06870.

[51] Kalyan Sunkavalli,et al. Deep 3D Capture: Geometry and Reflectance From Sparse Multi-View Images , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[52] Feng Liu,et al. Video Frame Interpolation via Adaptive Separable Convolution , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[53] Noah Snavely,et al. Neural Rerendering in the Wild , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[54] Graham Fyffe,et al. Stereo Magnification: Learning View Synthesis using Multiplane Images , 2018, ArXiv.

[55] Michael Bosse,et al. Unstructured lumigraph rendering , 2001, SIGGRAPH.

[56] Jan-Michael Frahm,et al. Sparse Dynamic 3D Reconstruction from Unsynchronized Videos , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[57] Hongdong Li,et al. Monocular Dense 3D Reconstruction of a Complex Dynamic Scene from Two Perspective Frames , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[58] Matthias Nießner,et al. VolumeDeform: Real-Time Volumetric Non-rigid Reconstruction , 2016, ECCV.

[59] Jitendra Malik,et al. Modeling and Rendering Architecture from Photographs: A hybrid geometry- and image-based approach , 1996, SIGGRAPH.

[60] Pushmeet Kohli,et al. Fusion4D , 2016, ACM Trans. Graph..

[61] Stefan Roth,et al. UnFlow: Unsupervised Learning of Optical Flow with a Bidirectional Census Loss , 2017, AAAI.

[62] Justus Thies,et al. Deferred Neural Rendering: Image Synthesis using Neural Textures , 2019 .

[63] Thomas Brox,et al. FlowNet 2.0: Evolution of Optical Flow Estimation with Deep Networks , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[64] Rudolf Mester,et al. Mono-SF: Multi-View Geometry Meets Single-View Depth for Monocular Scene Flow Estimation of Dynamic Traffic Scenes , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[65] Jan-Michael Frahm,et al. Deep blending for free-viewpoint image-based rendering , 2018, ACM Trans. Graph..

[66] Frédo Durand,et al. Unstructured Light Fields , 2012, Comput. Graph. Forum.

[67] Feng Liu,et al. Video Frame Interpolation via Adaptive Convolution , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[68] Li Zhang,et al. Soft 3D reconstruction for view synthesis , 2017, ACM Trans. Graph..

[69] Xiaoyun Zhang,et al. Depth-Aware Video Frame Interpolation , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[70] Feng Liu,et al. Softmax Splatting for Video Frame Interpolation , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[71] Jonathan T. Barron,et al. NeRF in the Wild: Neural Radiance Fields for Unconstrained Photo Collections , 2020, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[72] Pratul P. Srinivasan,et al. NeRF , 2020, ECCV.

[73] Noah Snavely,et al. Single-View View Synthesis With Multiplane Images , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[74] Yaser Sheikh,et al. 3D Reconstruction of a Moving Point from a Series of 2D Projections , 2010, ECCV.

[75] Simon Lucey,et al. General trajectory prior for Non-Rigid reconstruction , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[76] Jan Kautz,et al. Super SloMo: High Quality Estimation of Multiple Intermediate Frames for Video Interpolation , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[77] Andrew W. Fitzgibbon,et al. Real-time non-rigid reconstruction using an RGB-D camera , 2014, ACM Trans. Graph..

[78] Ruigang Yang,et al. Real-Time Simultaneous Pose and Shape Estimation for Articulated Objects Using a Single Depth Camera , 2016, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[79] Michael J. Black,et al. Optical Flow in Mostly Rigid Scenes , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[80] Vladlen Koltun,et al. Dense Monocular Depth Estimation in Complex Dynamic Scenes , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[81] Feng Liu,et al. 3D Ken Burns effect from a single image , 2019, ACM Trans. Graph..