NeRF in the Wild: Neural Radiance Fields for Unconstrained Photo Collections

We present a learning-based method for synthesizing novel views of complex scenes using only unstructured collections of in-the-wild photographs. We build on Neural Radiance Fields (NeRF), which uses the weights of a multi-layer perceptron to model the density and color of a scene as a function of 3D coordinates. While NeRF works well on images of static subjects captured under controlled settings, it is incapable of modeling many ubiquitous, real-world phenomena in uncontrolled images, such as variable illumination or transient occluders. We introduce a series of extensions to NeRF to address these issues, thereby enabling accurate reconstructions from unstructured image collections taken from the internet. We apply our system, dubbed NeRF-W, to internet photo collections of famous landmarks, and demonstrate temporally consistent novel view renderings that are significantly closer to photorealism than the prior state of the art.

[1]  Gordon Wetzstein,et al.  State of the Art on Neural Rendering , 2020, Comput. Graph. Forum.

[2]  Fridtjof Stein,et al.  Efficient Computation of Optical Flow Using the Census Transform , 2004, DAGM-Symposium.

[3]  Gordon Wetzstein,et al.  DeepVoxels: Learning Persistent 3D Feature Embeddings , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[4]  Richard Szeliski,et al.  Building Rome in a day , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[5]  Richard Szeliski,et al.  A Comparison and Evaluation of Multi-View Stereo Reconstruction Algorithms , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[6]  Bernhard P. Wrobel,et al.  Multiple View Geometry in Computer Vision , 2001 .

[7]  Jan-Michael Frahm,et al.  Structure-from-Motion Revisited , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[8]  Victor Lempitsky,et al.  Neural Point-Based Graphics , 2019, ECCV.

[9]  Kim-Han Thung,et al.  A survey of image quality measures , 2009, 2009 International Conference for Technical Postgraduates (TECHPOS).

[10]  Alexei A. Efros,et al.  Image-to-Image Translation with Conditional Adversarial Networks , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[11]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[12]  Adrien Bousseau,et al.  Coherent intrinsic images from photo collections , 2012, ACM Trans. Graph..

[13]  A. Bovik,et al.  A universal image quality index , 2002, IEEE Signal Processing Letters.

[14]  David Lopez-Paz,et al.  Optimizing the Latent Space of Generative Networks , 2017, ICML.

[15]  Thrasyvoulos N. Pappas,et al.  Perceptual criteria for image quality evaluation , 2005 .

[16]  Jean Ponce,et al.  Accurate, Dense, and Robust Multiview Stereopsis , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[17]  John Flynn,et al.  Deep Stereo: Learning to Predict New Views from the World's Imagery , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[18]  Steven M. Seitz,et al.  Photo tourism: exploring photo collections in 3D , 2006, ACM Trans. Graph..

[19]  Paul Debevec,et al.  DeepView: View Synthesis With Learned Gradient Descent , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[20]  Gordon Wetzstein,et al.  Scene Representation Networks: Continuous 3D-Structure-Aware Neural Scene Representations , 2019, NeurIPS.

[21]  Steven M. Seitz,et al.  The Visual Turing Test for Scene Reconstruction , 2013, 2013 International Conference on 3D Vision.

[22]  Ramin Zabih,et al.  Non-parametric Local Transforms for Computing Visual Correspondence , 1994, ECCV.

[23]  Alex Kendall,et al.  What Uncertainties Do We Need in Bayesian Deep Learning for Computer Vision? , 2017, NIPS.

[24]  Alexei A. Efros,et al.  The Unreasonable Effectiveness of Deep Features as a Perceptual Metric , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[25]  Pascal Fua,et al.  Image Matching Across Wide Baselines: From Paper to Practice , 2020, International Journal of Computer Vision.

[26]  Steven M. Seitz,et al.  LookinGood , 2018, ACM Trans. Graph..

[27]  Jonathan T. Barron,et al.  Unprocessing Images for Learned Raw Denoising , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[28]  Noah Snavely,et al.  Neural Rerendering in the Wild , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[29]  Michael Bosse,et al.  Unstructured lumigraph rendering , 2001, SIGGRAPH.

[30]  Jan-Michael Frahm,et al.  Augmenting Crowd-Sourced 3D Reconstructions Using Semantic Detections , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[31]  Andrea Fusiello Image-based Rendering * , 2003 .

[32]  Justus Thies,et al.  Deferred Neural Rendering: Image Synthesis using Neural Textures , 2019 .

[33]  Zhou Wang,et al.  Multiscale structural similarity for image quality assessment , 2003, The Thrity-Seventh Asilomar Conference on Signals, Systems & Computers, 2003.

[34]  Andrew W. Fitzgibbon,et al.  Bundle Adjustment - A Modern Synthesis , 1999, Workshop on Vision Algorithms.

[35]  Pratul P. Srinivasan,et al.  NeRF , 2020, ECCV.

[36]  Li Fei-Fei,et al.  Perceptual Losses for Real-Time Style Transfer and Super-Resolution , 2016, ECCV.

[37]  Peyman Milanfar,et al.  NIMA: Neural Image Assessment , 2017, IEEE Transactions on Image Processing.

[38]  Jonathan T. Barron,et al.  Fourier Features Let Networks Learn High Frequency Functions in Low Dimensional Domains , 2020, NeurIPS.