SPIn-NeRF: Multiview Segmentation and Perceptual Inpainting with Neural Radiance Fields

Neural Radiance Fields (NeRFs) have emerged as a popular approach for novel view synthesis. While NeRFs are quickly being adapted for a wider set of applications, intuitively editing NeRF scenes is still an open challenge. One important editing task is the removal of unwanted objects from a 3D scene, such that the replaced region is visually plausible and consistent with its context. We refer to this task as 3D inpainting. In 3D, solutions must be both consistent across multiple views and geometrically valid. In this paper, we propose a novel 3D inpainting method that addresses these challenges. Given a small set of posed images and sparse annotations in a single input image, our framework first rapidly obtains a 3D segmentation mask for a target object. Using the mask, a perceptual optimizationbased approach is then introduced that leverages learned 2D image inpainters, distilling their information into 3D space, while ensuring view consistency. We also address the lack of a diverse benchmark for evaluating 3D scene inpainting methods by introducing a dataset comprised of challenging real-world scenes. In particular, our dataset contains views of the same scene with and without a target object, enabling more principled benchmarking of the 3D inpainting task. We first demonstrate the superiority of our approach on multiview segmentation, comparing to NeRFbased methods and 2D segmentation approaches. We then evaluate on the task of 3D inpainting, establishing state-ofthe-art performance against other NeRF manipulation algorithms, as well as a strong 2D image inpainter baseline. Project Page: https://spinnerf3d.github.io

[1]  Humphrey Shi,et al.  Keys to Better Image Inpainting: Structure and Texture Go Hand in Hand , 2022, 2023 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV).

[2]  David J. Crandall,et al.  A Survey on Deep Learning Technique for Video Segmentation , 2021, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[3]  M. Pollefeys,et al.  Removing Objects From Neural Radiance Fields , 2022, 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[4]  Ben Poole,et al.  DreamFusion: Text-to-3D using 2D Diffusion , 2022, ICLR.

[5]  A. Vedaldi,et al.  Neural Feature Fusion Fields: 3D Distillation of Self-Supervised 2D Image Representations , 2022, 2022 International Conference on 3D Vision (3DV).

[6]  T. Funkhouser,et al.  MobileNeRF: Exploiting the Polygon Rasterization Pipeline for Efficient Neural Field Rendering on Mobile Architectures , 2022, ArXiv.

[7]  Jonathan Kelly,et al.  LaTeRF: Label and Text Driven Object Radiance Fields , 2022, ECCV.

[8]  Nicholas I. Kolkin,et al.  ARF: Artistic Radiance Fields , 2022, ECCV.

[9]  Binghui Chen,et al.  NeRF-In: Free-Form NeRF Inpainting with RGB-D Priors , 2022, ArXiv.

[10]  Jiaya Jia,et al.  EfficientNeRF - Efficient Neural Radiance Fields , 2022, 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[11]  V. Sitzmann,et al.  Decomposing NeRF for Editing via Feature Field Distillation , 2022, NeurIPS.

[12]  Bryan C. Russell,et al.  Neural Volumetric Object Selection , 2022, 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[13]  Yu-Kun Lai,et al.  NeRF-Editing: Geometry Editing of Neural Radiance Fields , 2022, 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[14]  Jiaya Jia,et al.  MAT: Mask-Aware Transformer for Large Hole Image Inpainting , 2022, 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[15]  Andreas Geiger,et al.  TensoRF: Tensorial Radiance Fields , 2022, ECCV.

[16]  Peter R. Florence,et al.  NeRF-Supervision: Learning Dense Object Descriptors from Neural Radiance Fields , 2022, 2022 International Conference on Robotics and Automation (ICRA).

[17]  Anpei Chen,et al.  ICARUS , 2022, ACM Trans. Graph..

[18]  T. Müller,et al.  Instant neural graphics primitives with a multiresolution hash encoding , 2022, ACM Trans. Graph..

[19]  Benjamin Recht,et al.  Plenoxels: Radiance Fields without Neural Networks , 2021, 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[20]  Dongdong Chen,et al.  CLIP-NeRF: Text-and-Image Driven Manipulation of Neural Radiance Fields , 2021, 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[21]  David B. Lindell,et al.  Bacon: Band-limited Coordinate Networks for Multiscale Scene Representation , 2021, 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[22]  Marek Kowalski,et al.  CoNeRF: Controllable Neural Radiance Fields , 2021, 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[23]  Andrea Tagliasacchi,et al.  NeSF: Neural Semantic Fields for Generalizable Semantic Segmentation of 3D Scenes , 2021, Trans. Mach. Learn. Res..

[24]  Pratul P. Srinivasan,et al.  Mip-NeRF 360: Unbounded Anti-Aliased Neural Radiance Fields , 2021, 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[25]  Victor Lempitsky,et al.  Resolution-robust Large Mask Inpainting with Fourier Convolutions , 2021, 2022 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV).

[26]  D. Ramanan,et al.  Depth-supervised NeRF: Fewer Views and Faster Training for Free , 2021, 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[27]  Michal Irani,et al.  Drop the GAN: In Defense of Patches Nearest Neighbors as Single Image Generative Models , 2021, 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[28]  Eli Shechtman,et al.  GeoFill: Reference-Based Image Inpainting of Scenes with Complex Geometry , 2022, ArXiv.

[29]  Lutao Chu,et al.  EdgeFlow: Achieving Practical Interactive Segmentation with Edge-Guided Flow , 2021, 2021 IEEE/CVF International Conference on Computer Vision Workshops (ICCVW).

[30]  Hujun Bao,et al.  Learning Object-Compositional Neural Radiance Field for Editable Scene Rendering , 2021, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).

[31]  David Salesin,et al.  SLIDE: Single Image 3D Photography with Soft Layering and Depth-aware Inpainting , 2021, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).

[32]  J.-Y. Zhu,et al.  Advances in Neural Rendering , 2021, SIGGRAPH Courses.

[33]  Zhoutong Zhang,et al.  Editing Conditional Radiance Fields , 2021, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).

[34]  Julien Mairal,et al.  Emerging Properties in Self-Supervised Vision Transformers , 2021, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).

[35]  Antonio Torralba,et al.  BARF: Bundle-Adjusting Neural Radiance Fields , 2021, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).

[36]  Stefan Leutenegger,et al.  In-Place Scene Labelling and Understanding with Implicit Scene Representation , 2021, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).

[37]  Jonathan T. Barron,et al.  Baking Neural Radiance Fields for Real-Time View Synthesis , 2021, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).

[38]  Ren Ng,et al.  PlenOctrees for Real-time Rendering of Neural Radiance Fields , 2021, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).

[39]  Pratul P. Srinivasan,et al.  Mip-NeRF: A Multiscale Representation for Anti-Aliasing Neural Radiance Fields , 2021, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).

[40]  Shengyu Zhao,et al.  Large Scale Image Completion via Co-Modulated Generative Adversarial Networks , 2021, ICLR.

[41]  Supasorn Suwajanakorn,et al.  NeX: Real-time View Synthesis with Neural Basis Expansion , 2021, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[42]  Pratul P. Srinivasan,et al.  IBRNet: Learning Multi-View Image-Based Rendering , 2021, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[43]  V. Prisacariu,et al.  NeRF-: Neural Radiance Fields Without Known Camera Parameters , 2021, ArXiv.

[44]  Vincent Drouard,et al.  A comprehensive review of past and present image inpainting methods , 2021, Comput. Vis. Image Underst..

[45]  Yuying Hao,et al.  PaddleSeg: A High-Efficient Development Toolkit for Image Segmentation , 2021, ArXiv.

[46]  B. Ommer,et al.  Taming Transformers for High-Resolution Image Synthesis , 2020, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[47]  Angjoo Kanazawa,et al.  pixelNeRF: Neural Radiance Fields from One or Few Images , 2020, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[48]  S. Gelly,et al.  An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale , 2020, ICLR.

[49]  Hamid Tairi,et al.  A survey of recent interactive image segmentation methods , 2020, Computational Visual Media.

[50]  Jonathan T. Barron,et al.  Fourier Features Let Networks Learn High Frequency Functions in Low Dimensional Domains , 2020, NeurIPS.

[51]  Pratul P. Srinivasan,et al.  NeRF , 2020, ECCV.

[52]  Yadong Mu,et al.  Fast Fourier Convolution , 2020, NeurIPS.

[53]  Natalia Gimelshein,et al.  PyTorch: An Imperative Style, High-Performance Deep Learning Library , 2019, NeurIPS.

[54]  Wangmeng Zuo,et al.  Image Inpainting With Learnable Bidirectional Attention Maps , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[55]  Bin Jiang,et al.  Coherent Semantic Attention for Image Inpainting , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[56]  Ravi Ramamoorthi,et al.  Local Light Field Fusion: Practical View Synthesis with Prescriptive Sampling Guidelines , 2019 .

[57]  Tali Dekel,et al.  SinGAN: Learning a Generative Model From a Single Natural Image , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[58]  Jianfei Cai,et al.  Pluralistic Image Completion , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[59]  N. Mitra,et al.  Escaping Plato’s Cave: 3D Shape From Adversarial Rendering , 2018, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[60]  Yi Wang,et al.  Image Inpainting via Generative Multi-column Convolutional Neural Networks , 2018, NeurIPS.

[61]  Shunyu Yao,et al.  3D-Aware Scene Manipulation via Inverse Graphics , 2018, NeurIPS.

[62]  Shiguang Shan,et al.  Shift-Net: Image Inpainting via Deep Feature Rearrangement , 2018, ECCV.

[63]  Thomas S. Huang,et al.  Generative Image Inpainting with Contextual Attention , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[64]  Alexei A. Efros,et al.  The Unreasonable Effectiveness of Deep Features as a Perceptual Metric , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[65]  Hiroshi Ishikawa,et al.  Globally and locally consistent image completion , 2017, ACM Trans. Graph..

[66]  Sepp Hochreiter,et al.  GANs Trained by a Two Time-Scale Update Rule Converge to a Local Nash Equilibrium , 2017, NIPS.

[67]  Lukasz Kaiser,et al.  Attention is All you Need , 2017, NIPS.

[68]  Yann Dauphin,et al.  Convolutional Sequence to Sequence Learning , 2017, ICML.

[69]  Alexei A. Efros,et al.  Multi-view Supervision for Single-View Reconstruction via Differentiable Ray Consistency , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[70]  Alexei A. Efros,et al.  Context Encoders: Feature Learning by Inpainting , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[71]  Alexei A. Efros,et al.  Scene completion using millions of photographs , 2008, Commun. ACM.

[72]  Vladimir Kolmogorov,et al.  "GrabCut": interactive foreground extraction using iterated graph cuts , 2004, ACM Trans. Graph..

[73]  Patrick Pérez,et al.  Object removal by exemplar-based inpainting , 2003, 2003 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2003. Proceedings..