JAWS: Just A Wild Shot for Cinematic Transfer in Neural Radiance Fields

This paper presents JAWS, an optimization-driven approach that achieves the robust transfer of visual cinematic features from a reference in-the-wild video clip to a newly generated clip. To this end, we rely on an implicit-neural-representation (INR) in a way to compute a clip that shares the same cinematic features as the reference clip. We propose a general formulation of a camera optimization problem in an INR that computes extrinsic and intrinsic camera parameters as well as timing. By leveraging the differentiability of neural representations, we can back-propagate our designed cinematic losses measured on proxy estimators through a NeRF network to the proposed cinematic parameters directly. We also introduce specific enhancements such as guidance maps to improve the overall quality and efficiency. Results display the capacity of our system to replicate well known camera sequences from movies, adapting the framing, camera parameters and timing of the generated video clip to maximize the similarity with the reference clip.

[1]  A. Davison,et al.  Feature-Realistic Neural Fusion for Real-Time, Open Set Scene Understanding , 2022, IEEE International Conference on Robotics and Automation.

[2]  Richard A. Newcombe,et al.  Nerfels: Renderable Neural Codes for Improved Camera Pose Estimation , 2022, 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[3]  Han Cai,et al.  Lite Pose: Efficient Architecture Design for 2D Human Pose Estimation , 2022, 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[4]  Ivan S. Shugurov,et al.  NeRF-Pose: A First-Reconstruct-Then-Regress Approach for Weakly-supervised 6D Object Pose Estimation , 2022, ArXiv.

[5]  T. Müller,et al.  Instant neural graphics primitives with a multiresolution hash encoding , 2022, ACM Trans. Graph..

[6]  Martin R. Oswald,et al.  NICE-SLAM: Neural Implicit Scalable Encoding for SLAM , 2021, 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[7]  Pratul P. Srinivasan,et al.  Dense Depth Priors for Neural Radiance Fields from Sparse Input Views , 2021, 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[8]  Libin Liu,et al.  Camera keyframing with style and control , 2021, ACM Trans. Graph..

[9]  Pratul P. Srinivasan,et al.  Mip-NeRF 360: Unbounded Anti-Aliased Neural Radiance Fields , 2021, 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[10]  René Ranftl,et al.  Learning high-speed flight in the wild , 2021, Science Robotics.

[11]  Weidong Cai,et al.  Network Pruning via Performance Maximization , 2021, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[12]  Jingyi Yu,et al.  Editable free-viewpoint video using a layered neural representation , 2021, ACM Trans. Graph..

[13]  Hao Su,et al.  MVSNeRF: Fast Generalizable Radiance Field Reconstruction from Multi-View Stereo , 2021, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).

[14]  Yiyi Liao,et al.  KiloNeRF: Speeding up Neural Radiance Fields with Thousands of Tiny MLPs , 2021, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).

[15]  Ren Ng,et al.  PlenOctrees for Real-time Rendering of Neural Radiance Fields , 2021, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).

[16]  Pratul P. Srinivasan,et al.  Mip-NeRF: A Multiscale Representation for Anti-Aliasing Neural Radiance Fields , 2021, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).

[17]  Edgar Sucar,et al.  iMAP: Implicit Mapping and Positioning in Real-Time , 2021, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).

[18]  Stephan J. Garbin,et al.  FastNeRF: High-Fidelity Neural Rendering at 200FPS , 2021, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).

[19]  Richard A. Newcombe,et al.  Neural 3D Video Synthesis from Multi-view Video , 2021, 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[20]  M. Zollhöfer,et al.  Non-Rigid Neural Radiance Fields: Reconstruction and Novel View Synthesis of a Dynamic Scene From Monocular Video , 2020, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).

[21]  Jonathan T. Barron,et al.  iNeRF: Inverting Neural Radiance Fields for Pose Estimation , 2020, 2021 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[22]  Francesc Moreno-Noguer,et al.  D-NeRF: Neural Radiance Fields for Dynamic Scenes , 2020, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[23]  Zhengqi Li,et al.  Neural Scene Flow Fields for Space-Time View Synthesis of Dynamic Scenes , 2020, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[24]  Changil Kim,et al.  Space-time Neural Irradiance Fields for Free-Viewpoint Video , 2020, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[25]  Bin Wang,et al.  Example-driven virtual cinematography by learning camera behaviors , 2020, ACM Trans. Graph..

[26]  Cheng Zhang,et al.  Path-space differentiable rendering , 2020, ACM Trans. Graph..

[27]  Christine Guillemot,et al.  Learning Fused Pixel and Feature-Based View Reconstructions for Light Fields , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[28]  Jia Deng,et al.  RAFT: Recurrent All-Pairs Field Transforms for Optical Flow , 2020, ECCV.

[29]  Pratul P. Srinivasan,et al.  NeRF , 2020, ECCV.

[30]  Siyu Zhu,et al.  End-to-End Learning Local Multi-View Descriptors for 3D Point Clouds , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[31]  Matthias Zwicker,et al.  SDFDiff: Differentiable Rendering of Signed Distance Fields for 3D Shape Optimization , 2019, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[32]  Natalia Gimelshein,et al.  PyTorch: An Imperative Style, High-Performance Deep Learning Library , 2019, NeurIPS.

[33]  Sebastian Scherer,et al.  Autonomous aerial cinematography in unstructured environments with learned artistic decision‐making , 2019, J. Field Robotics.

[34]  Xin Yang,et al.  Learning to Film From Professional Human Motion Videos , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[35]  Ravi Ramamoorthi,et al.  Local light field fusion , 2019, ACM Trans. Graph..

[36]  Xiaogang Wang,et al.  Learning Monocular Depth by Distilling Cross-domain Stereo Networks , 2018, ECCV.

[37]  Zhao Chen,et al.  GradNorm: Gradient Normalization for Adaptive Loss Balancing in Deep Multitask Networks , 2017, ICML.

[38]  Léon Bottou,et al.  Wasserstein Generative Adversarial Networks , 2017, ICML.

[39]  Dani Lischinski,et al.  Trip Synopsis: 60km in 60sec , 2016, Comput. Graph. Forum.

[40]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[41]  Hans-Peter Seidel,et al.  Camera Motion Style Transfer , 2010, 2010 Conference on Visual Media Production.

[42]  Patrick Olivier,et al.  Camera Control in Computer Graphics , 2006, Eurographics.

[43]  Richard Szeliski,et al.  A Database and Evaluation Methodology for Optical Flow , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[44]  S. Shankar Sastry,et al.  An Invitation to 3-D Vision: From Images to Geometric Models , 2003 .

[45]  Nicolas Courty,et al.  Controlling a camera in a virtual environment , 2002, The Visual Computer.

[46]  Christian Laugier,et al.  Automatic camera placement for robot vision tasks , 1995, Proceedings of 1995 IEEE International Conference on Robotics and Automation.

[47]  Takeo Kanade,et al.  Visual tracking of a moving target by a camera mounted on a robot: a combination of control and vision , 1993, IEEE Trans. Robotics Autom..

[48]  S. Umeyama,et al.  Least-Squares Estimation of Transformation Parameters Between Two Point Patterns , 1991, IEEE Trans. Pattern Anal. Mach. Intell..

[49]  L. V. Kantorovich,et al.  Mathematical Methods of Organizing and Planning Production , 1960 .

[50]  Merlin Nimier-David,et al.  Radiative Backpropagation: An Adjoint Method for Lightning-Fast Differentiable Rendering , 2020 .