论文信息 - Image-to-Voxel Model Translation with Conditional Adversarial Networks

Image-to-Voxel Model Translation with Conditional Adversarial Networks

We present a single-view voxel model prediction method that uses generative adversarial networks. Our method utilizes correspondences between 2D silhouettes and slices of a camera frustum to predict a voxel model of a scene with multiple object instances. We exploit pyramid shaped voxel and a generator network with skip connections between 2D and 3D feature maps. We collected two datasets VoxelCity and VoxelHome to train our framework with 36,416 images of 28 scenes with ground-truth 3D models, depth maps, and 6D object poses. We made the datasets publicly available (http://www.zefirus.org/Z_GAN). We evaluate our framework on 3D shape datasets to show that it delivers robust 3D scene reconstruction results that compete with and surpass state-of-the-art in a scene reconstruction with multiple non-rigid objects.

[1] Katsushi Ikeuchi,et al. Beyond Point Clouds: Scene Understanding by Reasoning Geometry and Physics , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[2] Yoshua Bengio,et al. Generative Adversarial Nets , 2014, NIPS.

[3] Fabio Poiesi,et al. 3DNOW: IMAGE-BASED 3D RECONSTRUCTION AND MODELING VIA WEB , 2018 .

[4] Andreas Geiger,et al. Augmented Reality Meets Computer Vision: Efficient Data Generation for Urban Driving Scenes , 2017, International Journal of Computer Vision.

[5] Matthias Nießner,et al. Learning to Navigate the Energy Landscape , 2016, 2016 Fourth International Conference on 3D Vision (3DV).

[6] Sabry F. El-Hakim. A FLEXIBLE APPROACH TO 3D RECONSTRUCTION FROM SINGLE IMAGES , 2001, SIGGRAPH 2001.

[7] Fabio Remondino,et al. Human figure reconstruction and modeling from single image or monocular video sequence , 2003, Fourth International Conference on 3-D Digital Imaging and Modeling, 2003. 3DIM 2003. Proceedings..

[8] Eric Brachmann,et al. Learning Less is More - 6D Camera Localization via 3D Surface Regression , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[9] Leonidas J. Guibas,et al. Frustum PointNets for 3D Object Detection from RGB-D Data , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[10] Vincent Lepetit,et al. BB8: A Scalable, Accurate, Robust to Partial Occlusion Method for Predicting the 3D Poses of Challenging Objects without Using Depth , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[11] Silvio Savarese,et al. 3D-R2N2: A Unified Approach for Single and Multi-view 3D Object Reconstruction , 2016, ECCV.

[12] Jörg Stückler,et al. Large-scale direct SLAM with stereo cameras , 2015, 2015 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[13] Yibin Li,et al. Region-sequence based six-stream CNN features for general and fine-grained human action recognition in videos , 2018, Pattern Recognit..

[14] Thomas A. Funkhouser,et al. Semantic Scene Completion from a Single Depth Image , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[15] Andreas Geiger,et al. Vision meets robotics: The KITTI dataset , 2013, Int. J. Robotics Res..

[16] Andrew W. Fitzgibbon,et al. Scene Coordinate Regression Forests for Camera Relocalization in RGB-D Images , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[17] Stepán Obdrzálek,et al. On Evaluation of 6D Object Pose Estimation , 2016, ECCV Workshops.

[18] J. Tenenbaum,et al. MarrNet : 3 D Shape Reconstruction via 2 . 5 D Sketches , 2017 .

[19] Eric Brachmann,et al. DSAC — Differentiable RANSAC for Camera Localization , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[20] Honglak Lee,et al. Perspective Transformer Nets: Learning Single-View 3D Object Reconstruction without 3D Supervision , 2016, NIPS.

[21] Jan-Michael Frahm,et al. Reconstructing the World* in Six Days *(As Captured by the Yahoo 100 Million Image Dataset) , 2015, CVPR 2015.

[22] Derek Hoiem,et al. Pixels, Voxels, and Views: A Study of Shape Representations for Single View 3D Object Shape Prediction , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[23] Antonio Torralba,et al. Parsing IKEA Objects: Fine Pose Estimation , 2013, 2013 IEEE International Conference on Computer Vision.

[24] Abhinav Gupta,et al. Learning a Predictable and Generative Vector Representation for Objects , 2016, ECCV.

[25] Manolis I. A. Lourakis,et al. T-LESS: An RGB-D Dataset for 6D Pose Estimation of Texture-Less Objects , 2017, 2017 IEEE Winter Conference on Applications of Computer Vision (WACV).

[26] Tae-Kyun Kim,et al. Multi-Task Deep Networks for Depth-Based 6D Object Pose and Joint Registration in Crowd Scenarios , 2018, BMVC.

[27] Thomas Brox,et al. Multi-view 3D Models from Single Images with a Convolutional Network , 2015, ECCV.

[28] Jiajun Wu,et al. Pix3D: Dataset and Methods for Single-Image 3D Shape Modeling , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[29] Eric Brachmann,et al. PoseAgent: Budget-Constrained 6D Object Pose Estimation via Reinforcement Learning , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[30] Stefan Roth,et al. Matryoshka Networks: Predicting 3D Geometry via Nested Shape Layers , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[31] Horst Bischof,et al. Incremental Surface Extraction from Sparse Structure-from-Motion Point Clouds , 2013, BMVC.

[32] Andreas Geiger,et al. Bounding Boxes, Segmentations and Object Coordinates: How Important is Recognition for 3D Scene Flow Estimation in Autonomous Driving Scenarios? , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[33] Jiajun Wu,et al. MarrNet: 3D Shape Reconstruction via 2.5D Sketches , 2017, NIPS.

[34] Tae-Kyun Kim,et al. Pose Guided RGBD Feature Learning for 3D Object Pose Estimation , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[35] Jiajun Wu,et al. Learning a Probabilistic Latent Space of Object Shapes via 3D Generative-Adversarial Modeling , 2016, NIPS.

[36] Silvio Savarese,et al. Beyond PASCAL: A benchmark for 3D object detection in the wild , 2014, IEEE Winter Conference on Applications of Computer Vision.

[37] Bo Yang,et al. 3D Object Dense Reconstruction from a Single Depth View , 2018, ArXiv.

[38] Andreas Geiger,et al. Object scene flow for autonomous vehicles , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[39] Krzysztof Walas,et al. Depth data fusion for simultaneous localization and mapping — RGB-DD SLAM , 2016, 2016 IEEE International Conference on Multisensor Fusion and Integration for Intelligent Systems (MFI).

[40] Fabio Menna,et al. A CRITICAL REVIEW OF AUTOMATED PHOTOGRAMMETRICPROCESSING OF LARGE DATASETS , 2017 .

[41] Jan-Michael Frahm,et al. Reconstructing the world* in six days , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[42] Jan-Michael Frahm,et al. Structure-from-Motion Revisited , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[43] Vincent Lepetit,et al. Model Based Training, Detection and Pose Estimation of Texture-Less 3D Objects in Heavily Cluttered Scenes , 2012, ACCV.

[44] Leonidas J. Guibas,et al. ShapeNet: An Information-Rich 3D Model Repository , 2015, ArXiv.

[45] Luc Van Gool,et al. The Pascal Visual Object Classes (VOC) Challenge , 2010, International Journal of Computer Vision.

[46] Simon J. Julier,et al. Structured Prediction of Unobserved Voxels from a Single Depth Image , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[47] Vladlen Koltun,et al. Single-view reconstruction via joint analysis of image and shape collections , 2015, ACM Trans. Graph..

[48] Sergey Yu. Zheltov,et al. Accuracy evaluation of structure from motion surface 3D reconstruction , 2017, Optical Metrology.

[49] Tae-Kyun Kim,et al. Latent-Class Hough Forests for 6 DoF Object Pose Estimation , 2016, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[50] Luca Antiga,et al. Automatic differentiation in PyTorch , 2017 .

[51] Luc Van Gool,et al. Cloud-based collaborative 3D reconstruction using smartphones , 2017, CVMP.

[52] Fabio Remondino,et al. Image‐based 3D Modelling: A Review , 2006 .

[53] Thomas Brox,et al. U-Net: Convolutional Networks for Biomedical Image Segmentation , 2015, MICCAI.

[54] V. V. Kniaz. ROBUST VISION-BASED POSE ESTIMATION ALGORITHM FOR AN UAV WITH KNOWN GRAVITY VECTOR , 2016 .

[55] Theodore Lim,et al. Generative and Discriminative Voxel Modeling with Convolutional Neural Networks , 2016, ArXiv.

[56] Roberto Cipolla,et al. Research data supporting “PoseNet: A Convolutional Network for Real-Time 6-DOF Camera Relocalization”: St Marys Church , 2015 .

[57] Markus Ulrich,et al. Introducing MVTec ITODD — A Dataset for 3D Object Recognition in Industry , 2017, 2017 IEEE International Conference on Computer Vision Workshops (ICCVW).

[58] Tae-Kyun Kim,et al. Recovering 6D Object Pose and Predicting Next-Best-View in the Crowd , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[59] Bo Yang,et al. 3D Object Reconstruction from a Single Depth View with Adversarial Learning , 2017, 2017 IEEE International Conference on Computer Vision Workshops (ICCVW).

[60] Derek Hoiem,et al. Completing 3D object shape from one depth image , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[61] Alexei A. Efros,et al. Image-to-Image Translation with Conditional Adversarial Networks , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[62] Yury Vizilter,et al. Deep Learning of Convolutional Auto-Encoder for Image Matching and 3D Object Reconstruction in the Infrared Range , 2017, 2017 IEEE International Conference on Computer Vision Workshops (ICCVW).