SINE: Semantic-driven Image-based NeRF Editing with Prior-guided Editing Field

Despite the great success in 2D editing using user-friendly tools, such as Photoshop, semantic strokes, or even text prompts, similar capabilities in 3D areas are still limited, either relying on 3D modeling skills or allowing editing within only a few categories. In this paper, we present a novel semantic-driven NeRF editing approach, which enables users to edit a neural radiance field with a single image, and faithfully delivers edited novel views with high fidelity and multi-view consistency. To achieve this goal, we propose a prior-guided editing field to encode fine-grained geometric and texture editing in 3D space, and develop a series of techniques to aid the editing process, including cyclic constraints with a proxy mesh to facilitate geometric supervision, a color compositing mechanism to stabilize semantic-driven texture editing, and a feature-cluster-based regularization to preserve the irrelevant content unchanged. Extensive experiments and editing examples on both real-world and synthetic data demonstrate that our method achieves photo-realistic 3D editing using only a single edited image, pushing the bound of semantic-driven editing in 3D real-world scenes. Our project webpage: https://zju3dv.github.io/sine/.

[1]  M. Irani,et al.  Imagic: Text-Based Real Image Editing with Diffusion Models , 2022, 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[2]  Yuhang Ming,et al.  Vox-Fusion: Dense Tracking and Mapping with Voxel-based Neural Implicit Representation , 2022, 2022 IEEE International Symposium on Mixed and Augmented Reality (ISMAR).

[3]  Ben Poole,et al.  DreamFusion: Text-to-3D using 2D Diffusion , 2022, ICLR.

[4]  Guosheng Lin,et al.  SymmNeRF: Learning to Explore Symmetry Prior for Single-View View Synthesis , 2022, ACCV.

[5]  F. Dellaert,et al.  im2nerf: Image to Neural Radiance Field in the Wild , 2022, ArXiv.

[6]  A. Vedaldi,et al.  Neural Feature Fusion Fields: 3D Distillation of Self-Supervised 2D Image Representations , 2022, 2022 International Conference on 3D Vision (3DV).

[7]  H. Bao,et al.  Vox-Surf: Voxel-Based Implicit Surface Representation , 2022, IEEE Transactions on Visualization and Computer Graphics.

[8]  Yue Liu,et al.  UPST-NeRF: Universal Photorealistic Style Transfer of Neural Radiance Fields for 3D Scene , 2022, ArXiv.

[9]  H. Bao,et al.  NeuMesh: Learning Disentangled Neural Mesh-based Implicit Field for Geometry and Texture Editing , 2022, ECCV.

[10]  Jianmin Zheng,et al.  Object-Compositional Neural Implicit Surfaces , 2022, ECCV.

[11]  H. Bao,et al.  Factorized and Controllable Neural Re-Rendering of Outdoor Scene for Photo Extrapolation , 2022, ACM Multimedia.

[12]  Jonathan Kelly,et al.  LaTeRF: Label and Text Driven Object Radiance Fields , 2022, ECCV.

[13]  Nicholas I. Kolkin,et al.  ARF: Artistic Radiance Fields , 2022, ECCV.

[14]  João F. Henriques,et al.  SNeS: Learning Probably Symmetric Neural Surfaces from Incomplete Data , 2022, ECCV.

[15]  X. Wang,et al.  IDE-3D , 2022, ACM Trans. Graph..

[16]  V. Sitzmann,et al.  Decomposing NeRF for Editing via Feature Field Distillation , 2022, NeurIPS.

[17]  Yu-Kun Lai,et al.  StylizedNeRF: Consistent 3D Scene Stylization as Stylized NeRF via 2D-3D Mutual Learning , 2022, 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[18]  Yu-Kun Lai,et al.  NeRF-Editing: Geometry Editing of Neural Radiance Fields , 2022, 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[19]  Tali Dekel,et al.  Text2LIVE: Text-Driven Layered Image and Video Editing , 2022, ECCV.

[20]  Yifan Jiang,et al.  Unified Implicit Neural Stylization , 2022, ECCV.

[21]  Yifan Jiang,et al.  SinNeRF: Training Neural Radiance Fields on Complex Scenes from a Single Image , 2022, ECCV.

[22]  L. Gool,et al.  Pix2NeRF: Unsupervised Conditional $\pi$-GAN for Single Image to Neural Radiance Fields Translation , 2022, 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[23]  T. Müller,et al.  Instant neural graphics primitives with a multiresolution hash encoding , 2022, ACM Trans. Graph..

[24]  Shai Bagon,et al.  Splicing ViT Features for Semantic Appearance Transfer , 2022, 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[25]  Martin R. Oswald,et al.  NICE-SLAM: Neural Implicit Scalable Encoding for SLAM , 2021, 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[26]  B. Ommer,et al.  High-Resolution Image Synthesis with Latent Diffusion Models , 2021, 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[27]  Shalini De Mello,et al.  Efficient Geometry-aware 3D Generative Adversarial Networks , 2021, 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[28]  Dongdong Chen,et al.  CLIP-NeRF: Text-and-Image Driven Manipulation of Neural Radiance Fields , 2021, 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[29]  Marek Kowalski,et al.  CoNeRF: Controllable Neural Radiance Fields , 2021, 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[30]  P. Abbeel,et al.  Zero-Shot Text-Guided Object Generation with Dream Fields , 2021, 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[31]  Jonathan T. Barron,et al.  RegNeRF: Regularizing Neural Radiance Fields for View Synthesis from Sparse Inputs , 2021, 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[32]  Yebin Liu,et al.  FENeRF: Face Editing in Neural Radiance Fields , 2021, 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[33]  Andrea Tagliasacchi,et al.  NeSF: Neural Semantic Fields for Generalizable Semantic Segmentation of 3D Scenes , 2021, Trans. Mach. Learn. Res..

[34]  Pratul P. Srinivasan,et al.  Mip-NeRF 360: Unbounded Anti-Aliased Neural Radiance Fields , 2021, 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[35]  Christian Theobalt,et al.  StyleNeRF: A Style-based 3D-Aware Generator for High-resolution Image Synthesis , 2021, ICLR.

[36]  Hung-Yu Tseng,et al.  Stylizing 3D Scene via Implicit Representation and HyperNetwork , 2021, 2022 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV).

[37]  H. Bao,et al.  Neural rendering in a room , 2022, ACM Transactions on Graphics.

[38]  Sanja Fidler,et al.  EditGAN: High-Precision Semantic Image Editing , 2021, NeurIPS.

[39]  Jonathan T. Barron,et al.  Neural-PIL: Neural Pre-Integrated Lighting for Reflectance Decomposition , 2021, NeurIPS.

[40]  Tali Dekel,et al.  Layered neural atlases for consistent video editing , 2021, ACM Trans. Graph..

[41]  Hujun Bao,et al.  Learning Object-Compositional Neural Radiance Field for Editable Scene Rendering , 2021, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).

[42]  Lourdes Agapito,et al.  CodeNeRF: Disentangled Neural Radiance Fields for Object Categories , 2021, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).

[43]  Jonathan T. Barron,et al.  HyperNeRF , 2021, ACM Trans. Graph..

[44]  Yaron Lipman,et al.  Volume Rendering of Neural Implicit Surfaces , 2021, NeurIPS.

[45]  C. Theobalt,et al.  NeuS: Learning Neural Implicit Surfaces by Volume Rendering for Multi-view Reconstruction , 2021, NeurIPS.

[46]  Paul Debevec,et al.  NeRFactor , 2021, ACM Trans. Graph..

[47]  Zhoutong Zhang,et al.  Editing Conditional Radiance Fields , 2021, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).

[48]  Julien Mairal,et al.  Emerging Properties in Self-Supervised Vision Transformers , 2021, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).

[49]  Michael J. Black,et al.  LEAP: Learning Articulated Occupancy of People , 2021, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[50]  Pieter Abbeel,et al.  Putting NeRF on a Diet: Semantically Consistent Few-Shot View Synthesis , 2021, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).

[51]  Daniel Cohen-Or,et al.  StyleCLIP: Text-Driven Manipulation of StyleGAN Imagery , 2021, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).

[52]  Andrea Tagliasacchi,et al.  COTR: Correspondence Transformer for Matching Across Images , 2021, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).

[53]  Ilya Sutskever,et al.  Learning Transferable Visual Models From Natural Language Supervision , 2021, ICML.

[54]  Frank Dellaert,et al.  Neural Volume Rendering: NeRF And Beyond , 2020, ArXiv.

[55]  Jiajun Wu,et al.  pi-GAN: Periodic Implicit Generative Adversarial Networks for 3D-Aware Image Synthesis , 2020, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[56]  Peter Wonka,et al.  AdaBins: Depth Estimation Using Adaptive Bins , 2020, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[57]  Jiaolong Yang,et al.  Deformed Implicit Field: Modeling 3D Shapes with Learned Dense Correspondence , 2020, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[58]  Zhengqi Li,et al.  Neural Scene Flow Fields for Space-Time View Synthesis of Dynamic Scenes , 2020, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[59]  Jonathan T. Barron,et al.  Nerfies: Deformable Neural Radiance Fields , 2020, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).

[60]  Andreas Geiger,et al.  GIRAFFE: Representing Scenes as Compositional Generative Neural Feature Fields , 2020, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[61]  Stefano Ermon,et al.  SDEdit: Image Synthesis and Editing with Stochastic Differential Equations , 2021, ArXiv.

[62]  Jiajun Wu,et al.  Object-Centric Neural Scene Rendering , 2020, ArXiv.

[63]  Andreas Geiger,et al.  GRAF: Generative Radiance Fields for 3D-Aware Image Synthesis , 2020, NeurIPS.

[64]  Pratul P. Srinivasan,et al.  NeRF , 2020, ECCV.

[65]  Richard A. Newcombe,et al.  DeepSDF: Learning Continuous Signed Distance Functions for Shape Representation , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[66]  Timo Aila,et al.  A Style-Based Generator Architecture for Generative Adversarial Networks , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[67]  Shiguang Shan,et al.  AttGAN: Facial Attribute Editing by Only Changing What You Want , 2017, IEEE Transactions on Image Processing.

[68]  Ali Farhadi,et al.  PhotoShape , 2018, ACM Trans. Graph..

[69]  James Hays,et al.  SketchyGAN: Towards Diverse and Realistic Sketch to Image Synthesis , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[70]  Jan Kautz,et al.  High-Resolution Image Synthesis and Semantic Manipulation with Conditional GANs , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[71]  Fisher Yu,et al.  Scribbler: Controlling Deep Image Synthesis with Sketch and Color , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[72]  Leonidas J. Guibas,et al.  ShapeNet: An Information-Rich 3D Model Repository , 2015, ArXiv.

[73]  Henrik Aanæs,et al.  Large Scale Multi-view Stereopsis Evaluation , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[74]  Marc Alexa,et al.  As-rigid-as-possible surface modeling , 2007, Symposium on Geometry Processing.

[75]  Nelson L. Max,et al.  Optical Models for Direct Volume Rendering , 1995, IEEE Trans. Vis. Comput. Graph..

[76]  William E. Lorensen,et al.  Marching cubes: A high resolution 3D surface construction algorithm , 1987, SIGGRAPH.