RoomDreamer: Text-Driven 3D Indoor Scene Synthesis with Coherent Geometry and Texture

The techniques for 3D indoor scene capturing are widely used, but the meshes produced leave much to be desired. In this paper, we propose"RoomDreamer", which leverages powerful natural language to synthesize a new room with a different style. Unlike existing image synthesis methods, our work addresses the challenge of synthesizing both geometry and texture aligned to the input scene structure and prompt simultaneously. The key insight is that a scene should be treated as a whole, taking into account both scene texture and geometry. The proposed framework consists of two significant components: Geometry Guided Diffusion and Mesh Optimization. Geometry Guided Diffusion for 3D Scene guarantees the consistency of the scene style by applying the 2D prior to the entire scene simultaneously. Mesh Optimization improves the geometry and texture jointly and eliminates the artifacts in the scanned scene. To validate the proposed method, real indoor scenes scanned with smartphones are used for extensive experiments, through which the effectiveness of our method is demonstrated.

[1]  D. Cohen-Or,et al.  Delta Denoising Score , 2023, 2023 IEEE/CVF International Conference on Computer Vision (ICCV).

[2]  Shalini De Mello,et al.  Generative Novel View Synthesis with 3D-Aware Diffusion Models , 2023, 2023 IEEE/CVF International Conference on Computer Vision (ICCV).

[3]  Xun Huang,et al.  DiffCollage: Parallel Generation of Large Content with Diffusion Models , 2023, 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[4]  Rui Chen,et al.  Fantasia3D: Disentangling Geometry and Appearance for High-quality Text-to-3D Content Creation , 2023, ArXiv.

[5]  Alexei A. Efros,et al.  Instruct-NeRF2NeRF: Editing 3D Scenes with Instructions , 2023, 2303.12789.

[6]  Jeong Joon Park,et al.  CC3D: Layout-Conditioned Generation of Compositional 3D Scenes , 2023, ArXiv.

[7]  M. Nießner,et al.  Text2Room: Extracting Textured 3D Meshes from 2D Text-to-Image Models , 2023, ArXiv.

[8]  Basile Van Hoorick,et al.  Zero-1-to-3: Zero-shot One Image to 3D Object , 2023, 2023 IEEE/CVF International Conference on Computer Vision (ICCV).

[9]  A. Vedaldi,et al.  RealFusion 360° Reconstruction of Any Object from a Single Image , 2023, 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[10]  C. Theobalt,et al.  NerfDiff: Single-image View Synthesis with NeRF-guided Distillation from 3D-aware Diffusion , 2023, ICML.

[11]  Y. Lipman,et al.  MultiDiffusion: Fusing Diffusion Paths for Controlled Image Generation , 2023, ICML.

[12]  Maneesh Agrawala,et al.  Adding Conditional Control to Text-to-Image Diffusion Models , 2023, ArXiv.

[13]  C. Qi,et al.  NeRDi: Single-View NeRF Synthesis with Language-Guided Diffusion as General Image Priors , 2022, 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[14]  M. Nießner,et al.  DiffRF: Rendering-Guided 3D Radiance Field Diffusion , 2022, 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[15]  Raymond A. Yeh,et al.  Score Jacobian Chaining: Lifting Pretrained 2D Diffusion Models for 3D Generation , 2022, 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[16]  Shubham Tulsiani,et al.  SparseFusion: Distilling View-Conditioned Diffusion for 3D Reconstruction , 2022, 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[17]  Yifan Jiang,et al.  NeuralLift-360: Lifting an in-the-Wild 2D Photo to A 3D Object with 360° Views , 2022, 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[18]  Xun Huang,et al.  Magic3D: High-Resolution Text-to-3D Content Creation , 2022, 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[19]  Alexei A. Efros,et al.  InstructPix2Pix: Learning to Follow Image Editing Instructions , 2022, 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[20]  Ben Poole,et al.  DreamFusion: Text-to-3D using 2D Diffusion , 2022, ICLR.

[21]  Walter A. Talbott,et al.  GAUDI: A Neural Architect for Immersive 3D Scene Generation , 2022, NeurIPS.

[22]  B. Ommer,et al.  High-Resolution Image Synthesis with Latent Diffusion Models , 2021, 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[23]  Shalini De Mello,et al.  Efficient Geometry-aware 3D Generative Adversarial Networks , 2021, 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[24]  M. Nießner,et al.  StyleMesh: Style Transfer for Indoor 3D Scene Reconstructions , 2021, 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[25]  Konrad Schindler,et al.  Towards Robust Monocular Depth Estimation: Mixing Datasets for Zero-Shot Cross-Dataset Transfer , 2019, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[26]  Afshin Dehghan,et al.  ARKitScenes: A Diverse Real-World Dataset For 3D Indoor Scene Understanding Using Mobile RGB-D Data , 2021, NeurIPS Datasets and Benchmarks.

[27]  Sanja Fidler,et al.  ATISS: Autoregressive Transformers for Indoor Scene Synthesis , 2021, NeurIPS.

[28]  Angel X. Chang,et al.  Plan2Scene: Converting Floorplans to 3D Scenes , 2021, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[29]  Nitish Srivastava,et al.  Unconstrained Scene Generation with Locally Conditioned Radiance Fields , 2021, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).

[30]  Angela Dai,et al.  RetrievalFuse: Neural 3D Scene Reconstruction with a Database , 2021, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).

[31]  Ilya Sutskever,et al.  Learning Transferable Visual Models From Natural Language Supervision , 2021, ICML.

[32]  Peng Liu,et al.  3D-FRONT: 3D Furnished Rooms with layOuts and semaNTics , 2020, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).

[33]  Jiaming Song,et al.  Denoising Diffusion Implicit Models , 2020, ICLR.

[34]  Jaakko Lehtinen,et al.  Modular primitives for high-performance differentiable rendering , 2020, ACM Trans. Graph..

[35]  Pieter Abbeel,et al.  Denoising Diffusion Probabilistic Models , 2020, NeurIPS.

[36]  M. Nießner,et al.  SG-NN: Sparse Generative Neural Networks for Self-Supervised Scene Completion of RGB-D Scans , 2019, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[37]  Shuicheng Yan,et al.  Very Long Natural Scenery Image Prediction by Outpainting , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[38]  Yang Song,et al.  Generative Modeling by Estimating Gradients of the Data Distribution , 2019, NeurIPS.

[39]  Kai Wang,et al.  Fast and Flexible Indoor Scene Synthesis via Deep Convolutional Generative Models , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[40]  Angel X. Chang,et al.  Deep convolutional priors for indoor scene synthesis , 2018, ACM Trans. Graph..

[41]  Zhengyou Zhang,et al.  Microsoft Kinect Sensor and Its Effect , 2012, IEEE Multim..

[42]  A. Torralba,et al.  Creating and exploring a large photorealistic virtual space , 2010, 2008 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops.