Neural Radiance Fields for Manhattan Scenes with Unknown Manhattan Frame

Novel view synthesis and 3D modeling using implicit neural field representation are shown to be very effective for calibrated multi-view cameras. Such representations are known to benefit from additional geometric and semantic supervision. Most existing methods that exploit additional supervision require dense pixel-wise labels or localized scene priors. These methods cannot benefit from high-level vague scene priors provided in terms of scenes' descriptions. In this work, we aim to leverage the geometric prior of Manhattan scenes to improve the implicit neural radiance field representations. More precisely, we assume that only the knowledge of the indoor scene (under investigation) being Manhattan is known -- with no additional information whatsoever -- with an unknown Manhattan coordinate frame. Such high-level prior is used to self-supervise the surface normals derived explicitly in the implicit neural fields. Our modeling allows us to group the derived normals and exploit their orthogonality constraints for self-supervision. Our exhaustive experiments on datasets of diverse indoor scenes demonstrate the significant benefit of the proposed method over the established baselines.

[1]  J. Leonard,et al.  NeRF-SLAM: Real-Time Dense Monocular SLAM with Neural Radiance Fields , 2022, 2023 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[2]  Kwan-Yee Kenneth Wong,et al.  S3-NeRF: Neural Reflectance Field from Shading and Shadow under a Single Viewpoint , 2022, NeurIPS.

[3]  Kwan-Yee Kenneth Wong,et al.  PS-NeRF: Neural Inverse Rendering for Multi-view Photometric Stereo , 2022, ECCV.

[4]  C. Theobalt,et al.  NeuRIS: Neural Reconstruction of Indoor Scenes Using Normal Priors , 2022, ECCV.

[5]  Andreas Geiger,et al.  MonoSDF: Exploring Monocular Geometric Cues for Neural Implicit Surface Reconstruction , 2022, NeurIPS.

[6]  T. Funkhouser,et al.  Panoptic Neural Fields: A Semantic Object-Aware Neural Scene Representation , 2022, 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[7]  H. Bao,et al.  Neural 3D Scene Reconstruction with the Manhattan-world Assumption , 2022, 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[8]  Yu-Chiang Frank Wang,et al.  NeurMiPs: Neural Mixture of Planar Experts for View Synthesis , 2022, 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[9]  Ivan S. Shugurov,et al.  NeRF-Pose: A First-Reconstruct-Then-Regress Approach for Weakly-supervised 6D Object Pose Estimation , 2022, ArXiv.

[10]  T. Müller,et al.  Instant neural graphics primitives with a multiresolution hash encoding , 2022, ACM Trans. Graph..

[11]  A. Makadia,et al.  Light Field Neural Rendering , 2021, 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[12]  Shalini De Mello,et al.  Efficient Geometry-aware 3D Generative Adversarial Networks , 2021, 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[13]  Jonathan T. Barron,et al.  Ref-NeRF: Structured View-Dependent Appearance for Neural Radiance Fields , 2021, 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[14]  Pratul P. Srinivasan,et al.  Dense Depth Priors for Neural Radiance Fields from Sparse Input Views , 2021, 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[15]  Jonathan T. Barron,et al.  RegNeRF: Regularizing Neural Radiance Fields for View Synthesis from Sparse Inputs , 2021, 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[16]  Yebin Liu,et al.  FENeRF: Face Editing in Neural Radiance Fields , 2021, 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[17]  Mehdi S. M. Sajjadi,et al.  Scene Representation Transformer: Geometry-Free Novel View Synthesis Through Set-Latent Scene Representations , 2021, 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[18]  Andrea Tagliasacchi,et al.  NeSF: Neural Semantic Fields for Generalizable Semantic Segmentation of 3D Scenes , 2021, Trans. Mach. Learn. Res..

[19]  Federico Tombari,et al.  Neural Fields in Visual Computing and Beyond , 2021, Comput. Graph. Forum.

[20]  D. Ramanan,et al.  Depth-supervised NeRF: Fewer Views and Faster Training for Free , 2021, 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[21]  Dan B. Goldman,et al.  Neural RGB-D Surface Reconstruction , 2021, 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[22]  Zehua Dong,et al.  Globally Optimal and Efficient Manhattan Frame Estimation by Delimiting Rotation Search Space , 2021, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).

[23]  Yaron Lipman,et al.  Volume Rendering of Neural Implicit Surfaces , 2021, NeurIPS.

[24]  C. Theobalt,et al.  NeuS: Learning Neural Implicit Surfaces by Volume Rendering for Multi-view Reconstruction , 2021, NeurIPS.

[25]  Antonio Torralba,et al.  BARF: Bundle-Adjusting Neural Radiance Fields , 2021, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).

[26]  Pieter Abbeel,et al.  Putting NeRF on a Diet: Semantically Consistent Few-Shot View Synthesis , 2021, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).

[27]  Stefan Leutenegger,et al.  In-Place Scene Labelling and Understanding with Implicit Scene Representation , 2021, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).

[28]  Yiyi Liao,et al.  KiloNeRF: Speeding up Neural Radiance Fields with Thousands of Tiny MLPs , 2021, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).

[29]  Ren Ng,et al.  PlenOctrees for Real-time Rendering of Neural Radiance Fields , 2021, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).

[30]  Edgar Sucar,et al.  iMAP: Implicit Mapping and Positioning in Real-Time , 2021, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).

[31]  Supasorn Suwajanakorn,et al.  NeX: Real-time View Synthesis with Neural Basis Expansion , 2021, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[32]  V. Prisacariu,et al.  NeRF-: Neural Radiance Fields Without Known Camera Parameters , 2021, ArXiv.

[33]  Jonathan T. Barron,et al.  Nerfies: Deformable Neural Radiance Fields , 2020, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).

[34]  Andreas Geiger,et al.  GIRAFFE: Representing Scenes as Compositional Generative Neural Feature Fields , 2020, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[35]  Mike Roberts,et al.  Hypersim: A Photorealistic Synthetic Dataset for Holistic Indoor Scene Understanding , 2020, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).

[36]  Jonathan T. Barron,et al.  NeRF in the Wild: Neural Radiance Fields for Unconstrained Photo Collections , 2020, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[37]  Jeff Johnson,et al.  Billion-Scale Similarity Search with GPUs , 2017, IEEE Transactions on Big Data.

[38]  Pratul P. Srinivasan,et al.  NeRF , 2020, ECCV.

[39]  Clara Fernandez-Labrador Indoor Scene Understanding using Non-Conventional Cameras. (Analyse de scènes intérieures à l'aide de caméras non conventionnelles) , 2020 .

[40]  Yaser Sheikh,et al.  Neural volumes , 2019, ACM Trans. Graph..

[41]  Michael Goesele,et al.  The Replica Dataset: A Digital Replica of Indoor Spaces , 2019, ArXiv.

[42]  Jian Yao,et al.  A Monocular SLAM System Leveraging Structural Regularity in Manhattan World , 2018, 2018 IEEE International Conference on Robotics and Automation (ICRA).

[43]  Guy Rosman,et al.  The Manhattan Frame Model—Manhattan World Inference in the Space of Surface Normals , 2018, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[44]  Ales Leonardis,et al.  Rolling Shutter Correction in Manhattan World , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[45]  Matthias Nießner,et al.  ScanNet: Richly-Annotated 3D Reconstructions of Indoor Scenes , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[46]  Kyungdon Joo,et al.  Globally Optimal Manhattan Frame Estimation in Real-Time , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[47]  Scott Workman,et al.  Detecting Vanishing Points Using Global Image Context in a Non-ManhattanWorld , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[48]  John J. Leonard,et al.  Real-time manhattan world rotation estimation in 3D , 2015, 2015 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[49]  Pascal Vasseur,et al.  A Branch-and-Bound Approach to Correspondence and Grouping Problems , 2013, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[50]  Pascal Vasseur,et al.  Globally optimal line clustering and vanishing point estimation in Manhattan world , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[51]  Allan Hanbury,et al.  Robust camera self-calibration from monocular images of Manhattan worlds , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[52]  Daniel G. Aliaga,et al.  Building reconstruction using manhattan-world grammars , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[53]  Gautam Singh Visual Loop Closing using Gist Descriptors in Manhattan World , 2010 .

[54]  Richard Szeliski,et al.  Manhattan-world stereo , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[55]  James H. Elder,et al.  Efficient Edge-Based Methods for Estimating Manhattan Frames in Urban Imagery , 2008, ECCV.

[56]  Christopher M. Bishop,et al.  Pattern Recognition and Machine Learning (Information Science and Statistics) , 2006 .

[57]  Eero P. Simoncelli,et al.  Image quality assessment: from error visibility to structural similarity , 2004, IEEE Transactions on Image Processing.

[58]  Alan L. Yuille,et al.  Manhattan World: compass direction from a single image by Bayesian inference , 1999, Proceedings of the Seventh IEEE International Conference on Computer Vision.

[59]  Frank A. van den Heuvel,et al.  3D reconstruction from a single image using geometric constraints , 1998 .

[60]  G. F. McLean,et al.  Vanishing Point Detection by Line Clustering , 1995, IEEE Trans. Pattern Anal. Mach. Intell..