论文信息 - Image-to-Voxel Model Translation for 3D Scene Reconstruction and Segmentation

Image-to-Voxel Model Translation for 3D Scene Reconstruction and Segmentation

Objects class, depth, and shape are instantly reconstructed by a human looking at a 2D image. While modern deep models solve each of these challenging tasks separately, they struggle to perform simultaneous scene 3D reconstruction and segmentation. We propose a single shot image-to-semantic voxel model translation framework. We train a generator adversarially against a discriminator that verifies the object’s poses. Furthermore, trapezium-shaped voxels, volumetric residual blocks, and 2D-to-3D skip connections facilitate our model learning explicit reasoning about 3D scene structure. We collected a SemanticVoxels dataset with 116k images, ground-truth semantic voxel models, depth maps, and 6D object poses. Experiments on ShapeNet and our SemanticVoxels datasets demonstrate that our framework achieves and surpasses stateof-the-art in the reconstruction of scenes with multiple non-rigid objects of different classes. We made our model and dataset publicly available.

[1] Gordon Wetzstein,et al. DeepVoxels: Learning Persistent 3D Feature Embeddings , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[2] Didier Stricker,et al. IsMo-GAN: Adversarial Learning for Monocular Non-Rigid 3D Reconstruction , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[3] R. Reulke,et al. Remote Sensing and Spatial Information Sciences , 2005 .

[4] Ali Farhadi,et al. YOLO9000: Better, Faster, Stronger , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[5] Yaonan Wang,et al. 3D Face Reconstruction from Light Field Images: A Model-free Approach , 2017, ECCV.

[6] Charless C. Fowlkes,et al. 3D Scene Reconstruction With Multi-Layer Depth and Epipolar Transformers , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[7] Alexey Dosovitskiy,et al. Unsupervised Learning of Shape and Pose with Differentiable Point Clouds , 2018, NeurIPS.

[8] V. A. Mizginov,et al. EVALUATING THE ACCURACY OF 3D OBJECT RECONSTRUCTION FROM THERMAL IMAGES , 2019 .

[9] Walter G. Kropatsch,et al. ThermalGAN: Multimodal Color-to-Thermal Image Translation for Person Re-identification in Multispectral Dataset , 2018, ECCV Workshops.

[10] Luca Antiga,et al. Automatic differentiation in PyTorch , 2017 .

[11] Leonidas J. Guibas,et al. Multiview Aggregation for Learning Category-Specific Shape Reconstruction , 2019, NeurIPS.

[12] Alexei A. Efros,et al. Multi-view Supervision for Single-View Reconstruction via Differentiable Ray Consistency , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[13] Serge J. Belongie,et al. Learning Single-View 3D Reconstruction with Limited Pose Supervision , 2018, ECCV.

[14] Yuning Jiang,et al. Unified Perceptual Parsing for Scene Understanding , 2018, ECCV.

[15] Thomas Brox,et al. U-Net: Convolutional Networks for Biomedical Image Segmentation , 2015, MICCAI.

[16] Haojie Li,et al. Accurate Monocular 3D Object Detection via Color-Embedded 3D Reconstruction for Autonomous Driving , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[17] Jiajun Wu,et al. Pix3D: Dataset and Methods for Single-Image 3D Shape Modeling , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[18] Vladimir V. Kniaz,et al. LONG WAVE INFRARED IMAGE COLORIZATION FOR PERSON RE-IDENTIFICATION , 2019, The International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences.

[19] Patrick Pérez,et al. MoFA: Model-Based Deep Convolutional Face Autoencoder for Unsupervised Monocular Reconstruction , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[20] James M. Rehg,et al. 3D-RCNN: Instance-Level 3D Object Reconstruction via Render-and-Compare , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[21] Tao Yu,et al. DeepHuman: 3D Human Reconstruction From a Single Image , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[22] Georgios Tzimiropoulos,et al. Large Pose 3D Face Reconstruction from a Single Image via Direct Volumetric CNN Regression , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[23] Abhinav Gupta,et al. Learning a Predictable and Generative Vector Representation for Objects , 2016, ECCV.

[24] Fabio Remondino,et al. Image-to-Voxel Model Translation with Conditional Adversarial Networks , 2018, ECCV Workshops.

[25] Qiang Xu,et al. nuScenes: A Multimodal Dataset for Autonomous Driving , 2019, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[26] Thomas Brox,et al. Octree Generating Networks: Efficient Convolutional Architectures for High-resolution 3D Outputs , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[27] Fabio Remondino,et al. GENERATIVE ADVERSARIAL NETWORKS FOR SINGLE PHOTO 3D RECONSTRUCTION , 2019, The International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences.

[28] Luc Van Gool,et al. Progressive Structure from Motion , 2018, ECCV.

[29] Fabio Remondino,et al. The Point Where Reality Meets Fantasy: Mixed Adversarial Generators for Image Splice Detection , 2019, NeurIPS.

[30] Ersin Yumer,et al. PlaneNet: Piece-Wise Planar Reconstruction from a Single RGB Image , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[31] Jiajun Wu,et al. Learning Shape Priors for Single-View 3D Completion and Reconstruction , 2018, ECCV.

[32] Stefan Roth,et al. Matryoshka Networks: Predicting 3D Geometry via Nested Shape Layers , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[33] Silvio Savarese,et al. 4D Spatio-Temporal ConvNets: Minkowski Convolutional Neural Networks , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[34] Iasonas Kokkinos,et al. DenseReg: Fully Convolutional Dense Shape Regression In-the-Wild , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[35] Subhransu Maji,et al. 3D Shape Segmentation with Projective Convolutional Networks , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[36] Matthias Nießner,et al. ScanNet: Richly-Annotated 3D Reconstructions of Indoor Scenes , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[37] Mark Sandler,et al. MobileNetV2: Inverted Residuals and Linear Bottlenecks , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[38] Jian Sun,et al. Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[39] Jiajun Wu,et al. Learning to Reconstruct Shapes from Unseen Classes , 2018, NeurIPS.

[40] Vladimir V. Kniaz,et al. Deep Learning a Single Photo Voxel Model Prediction from Real and Synthetic Images , 2019, Studies in Computational Intelligence.

[41] Ron Kimmel,et al. Unrestricted Facial Geometry Reconstruction Using Image-to-Image Translation , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[42] Xiaojuan Qi,et al. GAL: Geometric Adversarial Loss for Single-View 3D-Object Reconstruction , 2018, ECCV.

[43] Alexei A. Efros,et al. Image-to-Image Translation with Conditional Adversarial Networks , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[44] Hao Su,et al. A Point Set Generation Network for 3D Object Reconstruction from a Single Image , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[45] Francesc Moreno-Noguer,et al. Image Collection Pop-up: 3D Reconstruction and Clustering of Rigid and Non-rigid Categories , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[46] Yang Yang,et al. Learning Category-Specific 3D Shape Models from Weakly Labeled 2D Images , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[47] Yi Zhou,et al. Single-View Hair Reconstruction using Convolutional Neural Networks , 2018, ECCV.

[48] Ioannis A. Kakadiaris,et al. End-to-End 3D Face Reconstruction with Deep Neural Networks , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[49] Ian D. Reid,et al. Efficient Dense Point Cloud Object Reconstruction Using Deformation Vector Fields , 2018, ECCV.

[50] Duygu Ceylan,et al. DISN: Deep Implicit Surface Network for High-quality Single-view 3D Reconstruction , 2019, NeurIPS.

[51] Jianfei Cai,et al. T2Net: Synthetic-to-Realistic Translation for Solving Single-Image Depth Estimation Tasks , 2018, ECCV.

[52] Ian D. Reid,et al. Unsupervised Learning of Monocular Depth Estimation and Visual Odometry with Deep Feature Reconstruction , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[53] Derek Hoiem,et al. Pixels, Voxels, and Views: A Study of Shape Representations for Single View 3D Object Shape Prediction , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[54] Jan Kautz,et al. High-Resolution Image Synthesis and Semantic Manipulation with Conditional GANs , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[55] Yin Zhou,et al. VoxelNet: End-to-End Learning for Point Cloud Based 3D Object Detection , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[56] Shengping Zhang,et al. Pix2Vox: Context-Aware 3D Reconstruction From Single and Multi-View Images , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[57] Hongdong Li,et al. Monocular Dense 3D Reconstruction of a Complex Dynamic Scene from Two Perspective Frames , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[58] John F. Canny,et al. A Computational Approach to Edge Detection , 1986, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[59] Pietro Perona,et al. Microsoft COCO: Common Objects in Context , 2014, ECCV.

[60] Georgios Tzimiropoulos,et al. 3D Human Body Reconstruction from a Single Image via Volumetric Regression , 2018, ECCV Workshops.

[61] Simon Lucey,et al. Rethinking Reprojection: Closing the Loop for Pose-Aware Shape Reconstruction from a Single Image , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[62] Karthik Ramani,et al. SurfNet: Generating 3D Shape Surfaces Using Deep Residual Networks , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[63] Juergen Gall,et al. Two Stream 3D Semantic Scene Completion , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[64] Silvio Savarese,et al. Beyond PASCAL: A benchmark for 3D object detection in the wild , 2014, IEEE Winter Conference on Applications of Computer Vision.

[65] Jitendra Malik,et al. Learning Category-Specific Mesh Reconstruction from Image Collections , 2018, ECCV.

[66] Matthias Nießner,et al. PlaneMatch: Patch Coplanarity Prediction for Robust RGB-D Reconstruction , 2018, ECCV.

[67] Edmond Boyer,et al. Shape Reconstruction Using Volume Sweeping and Learned Photoconsistency , 2018, ECCV.

[68] Leonidas J. Guibas,et al. ShapeNet: An Information-Rich 3D Model Repository , 2015, ArXiv.

[69] Matan Sela,et al. Learning Detailed Face Reconstruction from a Single Image , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[70] Xiaowei Zhou,et al. Coarse-to-Fine Volumetric Prediction for Single-Image 3D Human Pose , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[71] Song-Chun Zhu,et al. Holistic 3D Scene Parsing and Reconstruction from a Single RGB Image , 2018, ECCV.

[72] Jiajun Wu,et al. MarrNet: 3D Shape Reconstruction via 2.5D Sketches , 2017, NIPS.

[73] Jiajun Wu,et al. Learning a Probabilistic Latent Space of Object Shapes via 3D Generative-Adversarial Modeling , 2016, NIPS.

[74] Subhransu Maji,et al. Shape Reconstruction Using Differentiable Projections and Deep Priors , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[75] Kyoung Mu Lee,et al. V2V-PoseNet: Voxel-to-Voxel Prediction Network for Accurate 3D Hand and Human Pose Estimation from a Single Depth Map , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[76] Yoshua Bengio,et al. Generative Adversarial Nets , 2014, NIPS.

[77] Silvio Savarese,et al. 3D-R2N2: A Unified Approach for Single and Multi-view 3D Object Reconstruction , 2016, ECCV.