论文信息 - Single-Shot Scene Reconstruction

Single-Shot Scene Reconstruction

We introduce a novel scene reconstruction method to infer a fully ed1 itable and re-renderable model of a 3D road scene from a single image. We rep2 resent movable objects separately from the immovable background, and recover a 3 full 3D model of each distinct object as well as their spatial relations in the scene. 4 Based on transformer-based detectors and neural implicit 3D representations, we 5 build a Scene Decomposition Network (SDN) that reconstructs the scene, and the 6 reconstruction can further be used in analysis-by-synthesis via differentiable ren7 dering. Trained only on simulated road scenes, our method generalizes well to real 8 data in the same class without any adaptation thanks to its strong inductive priors. 9 Experiments on two synthetic-real dataset pairs (PD-DDAD and VKITTI-KITTI) 10 show that our method can robustly recover scene geometry and appearance, as 11 well as reconstruct and re-render the scene from novel viewpoints. 12

[1] Jonathan T. Barron,et al. NeRV: Neural Reflectance and Visibility Fields for Relighting and View Synthesis , 2020, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[2] Xiangyang Ji,et al. CDPN: Coordinates-Based Disentangled Pose Network for Real-Time RGB-Based 6-DoF Object Pose Estimation , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[3] Yi Li,et al. DeepIM: Deep Iterative Matching for 6D Pose Estimation , 2018, International Journal of Computer Vision.

[4] Gordon Wetzstein,et al. Scene Representation Networks: Continuous 3D-Structure-Aware Neural Scene Representations , 2019, NeurIPS.

[5] Kyaw Zaw Lin,et al. Neural Sparse Voxel Fields , 2020, NeurIPS.

[6] Sebastian Nowozin,et al. Occupancy Networks: Learning 3D Reconstruction in Function Space , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[7] Zhengyou Zhang,et al. A Flexible New Technique for Camera Calibration , 2000, IEEE Trans. Pattern Anal. Mach. Intell..

[8] Felix Heide,et al. Neural Scene Graphs for Dynamic Scenes , 2020, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[9] Adrien Gaidon,et al. Autolabeling 3D Objects With Differentiable Rendering of SDF Shape Priors , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[10] Vincent Lepetit,et al. Hashmod: A Hashing Method for Scalable 3D Object Detection , 2016, BMVC.

[11] Changil Kim,et al. Space-time Neural Irradiance Fields for Free-Viewpoint Video , 2020, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[12] Daniel Cremers,et al. Image-Based Localization Using LSTMs for Structured Feature Correlation , 2016, 2017 IEEE International Conference on Computer Vision (ICCV).

[13] Matthew Tancik,et al. pixelNeRF: Neural Radiance Fields from One or Few Images , 2021, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[14] Roberto Cipolla,et al. PoseNet: A Convolutional Network for Real-Time 6-DOF Camera Relocalization , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[15] Hujun Bao,et al. PVNet: Pixel-Wise Voting Network for 6DoF Pose Estimation , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[16] Slobodan Ilic,et al. DPOD: Dense 6D Pose Object Detector in RGB images , 2019, ArXiv.

[17] ZhangZhengyou. A Flexible New Technique for Camera Calibration , 2000 .

[18] Hao Li,et al. Soft Rasterizer: A Differentiable Renderer for Image-Based 3D Reasoning , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[19] Jaakko Lehtinen,et al. Learning to Predict 3D Objects with an Interpolation-based Differentiable Renderer , 2019, NeurIPS.

[20] Rares Ambrus,et al. 3D Packing for Self-Supervised Monocular Depth Estimation , 2019, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[21] Andreas Geiger,et al. Differentiable Volumetric Rendering: Learning Implicit 3D Representations Without 3D Supervision , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[22] Vincent Lepetit,et al. Multimodal templates for real-time detection of texture-less objects in heavily cluttered scenes , 2011, 2011 International Conference on Computer Vision.

[23] Jiri Matas,et al. EPOS: Estimating 6D Pose of Objects With Symmetries , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[24] Zizhang Wu,et al. SMOKE: Single-Stage Monocular 3D Object Detection via Keypoint Estimation , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[25] Eric Brachmann,et al. iPose: Instance-Aware 6D Pose Estimation of Partly Occluded Objects , 2017, ACCV.

[26] Matthias Zwicker,et al. Surfels: surface elements as rendering primitives , 2000, SIGGRAPH.

[27] Leonidas J. Guibas,et al. Normalized Object Coordinate Space for Category-Level 6D Object Pose and Size Estimation , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[28] Charles T. Loop,et al. Neural Geometric Level of Detail: Real-time Rendering with Implicit 3D Shapes , 2021, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[29] Andreas Geiger,et al. GIRAFFE: Representing Scenes as Compositional Generative Neural Feature Fields , 2020, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[30] Tanner Schmidt,et al. STaR: Self-supervised Tracking and Reconstruction of Rigid Objects in Motion with Neural Rendering , 2020, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[31] Song-Chun Zhu,et al. Holistic++ Scene Understanding: Single-View 3D Holistic Scene Parsing and Human Pose Estimation With Human-Object Interaction and Physical Commonsense , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[32] Carsten Rother,et al. Panoptic Segmentation , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[33] Eddy Ilg,et al. Deep Local Shapes: Learning Local SDF Priors for Detailed 3D Reconstruction , 2020, ECCV.

[34] Gordon Wetzstein,et al. AutoInt: Automatic Integration for Fast Neural Volume Rendering , 2020, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[35] Jonathan T. Barron,et al. NeRF in the Wild: Neural Radiance Fields for Unconstrained Photo Collections , 2020, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[36] Noah Snavely,et al. Neural Scene Flow Fields for Space-Time View Synthesis of Dynamic Scenes , 2021, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[37] Pratul P. Srinivasan,et al. NeRF , 2020, ECCV.

[38] Timothy Patten,et al. Pix2Pose: Pixel-Wise Coordinate Regression of Objects for 6D Pose Estimation , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[39] Francesc Moreno-Noguer,et al. D-NeRF: Neural Radiance Fields for Dynamic Scenes , 2020, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[40] Zhijian Liu,et al. Learning to Exploit Stability for 3D Scene Parsing , 2018, NeurIPS.

[41] Vincent Lepetit,et al. Learning descriptors for object recognition and 3D pose estimation , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[42] George Loizou,et al. Computer vision and pattern recognition , 2007, Int. J. Comput. Math..

[43] Roberto Cipolla,et al. Geometric Loss Functions for Camera Pose Regression with Deep Learning , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[44] Slobodan Ilic,et al. Multi-View Object Pose Refinement With Differentiable Renderer , 2021, IEEE Robotics and Automation Letters.

[45] Jonathan T. Barron,et al. Learned Initializations for Optimizing Coordinate-Based Neural Representations , 2020, ArXiv.

[46] Rob Fergus,et al. Depth Map Prediction from a Single Image using a Multi-Scale Deep Network , 2014, NIPS.

[47] G. G. Stokes. "J." , 1890, The New Yale Book of Quotations.

[48] Andreas Geiger,et al. Are we ready for autonomous driving? The KITTI vision benchmark suite , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[49] Richard A. Newcombe,et al. DeepSDF: Learning Continuous Signed Distance Functions for Shape Representation , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).