GENERATIVE ADVERSARIAL NETWORKS FOR SINGLE PHOTO 3 D RECONSTRUCTION

Fast but precise 3D reconstructions of cultural heritage scenes are becoming very requested in the archaeology and architecture. While modern multi-image 3D reconstruction approaches provide impressive results in terms of textured surface models, it is often the need to create a 3D model for which only a single photo (or few sparse) is available. This paper focuses on the single photo 3D reconstruction problem for lost cultural objects for which only a few images are remaining. We use image-to-voxel translation network (Z-GAN) as a starting point. Z-GAN network utilizes the skip connections in the generator network to transfer 2D features to a 3D voxel model effectively (Figure 1). Therefore, the network can generate voxel models of previously unseen objects using object silhouettes present on the input image and the knowledge obtained during a training stage. In order to train our Z-GAN network, we created a large dataset that includes aligned sets of images and corresponding voxel models of an ancient Greek temple. We evaluated the Z-GAN network for single photo reconstruction on complex structures like temples as well as on lost heritage still available in crowdsourced images. Comparison of the reconstruction results with state-of-the-art methods are also presented and commented. Figure 1: Overview of the Z-GAN generator network employed for 3D reconstruction from a single image.

[1]  Simon J. Julier,et al.  Structured Prediction of Unobserved Voxels from a Single Depth Image , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[2]  Fabio Remondino,et al.  Image-to-Voxel Model Translation with Conditional Adversarial Networks , 2018, ECCV Workshops.

[3]  Stefan Roth,et al.  Matryoshka Networks: Predicting 3D Geometry via Nested Shape Layers , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[4]  Vladlen Koltun,et al.  Single-view reconstruction via joint analysis of image and shape collections , 2015, ACM Trans. Graph..

[5]  Steven M. Seitz,et al.  Multicore bundle adjustment , 2011, CVPR 2011.

[6]  Fabio Remondino,et al.  Image‐based 3D Modelling: A Review , 2006 .

[7]  Yoshua Bengio,et al.  Generative Adversarial Nets , 2014, NIPS.

[8]  Abhinav Gupta,et al.  Learning a Predictable and Generative Vector Representation for Objects , 2016, ECCV.

[9]  Thomas Brox,et al.  U-Net: Convolutional Networks for Biomedical Image Segmentation , 2015, MICCAI.

[10]  Jiajun Wu,et al.  Learning a Probabilistic Latent Space of Object Shapes via 3D Generative-Adversarial Modeling , 2016, NIPS.

[11]  Vladimir V. Kniaz,et al.  DEEP LEARNING FOR LOWTEXTURED IMAGE MATCHING , 2018 .

[12]  Fabio Menna,et al.  A CRITICAL REVIEW OF AUTOMATED PHOTOGRAMMETRICPROCESSING OF LARGE DATASETS , 2017 .

[13]  Theodore Lim,et al.  Generative and Discriminative Voxel Modeling with Convolutional Neural Networks , 2016, ArXiv.

[14]  Thomas A. Funkhouser,et al.  Semantic Scene Completion from a Single Depth Image , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[15]  Thomas Brox,et al.  Multi-view 3D Models from Single Images with a Convolutional Network , 2015, ECCV.

[16]  Fabio Remondino,et al.  Human figure reconstruction and modeling from single image or monocular video sequence , 2003, Fourth International Conference on 3-D Digital Imaging and Modeling, 2003. 3DIM 2003. Proceedings..

[17]  Silvio Savarese,et al.  3D-R2N2: A Unified Approach for Single and Multi-view 3D Object Reconstruction , 2016, ECCV.

[18]  Bo Yang,et al.  3D Object Reconstruction from a Single Depth View with Adversarial Learning , 2017, 2017 IEEE International Conference on Computer Vision Workshops (ICCVW).

[19]  Fabio Poiesi,et al.  3D RECONSTRUCTION WITH A COLLABORATIVE APPROACHBASED ON SMARTPHONES AND A CLOUD-BASED SERVER , 2017 .

[20]  Bo Yang,et al.  3D Object Dense Reconstruction from a Single Depth View , 2018, ArXiv.

[21]  Luca Antiga,et al.  Automatic differentiation in PyTorch , 2017 .

[22]  Yury Vizilter,et al.  Deep Learning of Convolutional Auto-Encoder for Image Matching and 3D Object Reconstruction in the Infrared Range , 2017, 2017 IEEE International Conference on Computer Vision Workshops (ICCVW).

[23]  Derek Hoiem,et al.  Pixels, Voxels, and Views: A Study of Shape Representations for Single View 3D Object Shape Prediction , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[24]  Katsushi Ikeuchi,et al.  Beyond Point Clouds: Scene Understanding by Reasoning Geometry and Physics , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[25]  Jiajun Wu,et al.  MarrNet: 3D Shape Reconstruction via 2.5D Sketches , 2017, NIPS.

[26]  Fabio Remondino,et al.  State of the art in high density image matching , 2014 .

[27]  Sabry F. El-Hakim A FLEXIBLE APPROACH TO 3D RECONSTRUCTION FROM SINGLE IMAGES , 2001, SIGGRAPH 2001.

[28]  Jan-Michael Frahm,et al.  Structure-from-Motion Revisited , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[29]  Honglak Lee,et al.  Perspective Transformer Nets: Learning Single-View 3D Object Reconstruction without 3D Supervision , 2016, NIPS.

[30]  Fabio Poiesi,et al.  3DNOW: IMAGE-BASED 3D RECONSTRUCTION AND MODELING VIA WEB , 2018 .

[31]  Jan-Michael Frahm,et al.  Reconstructing the World* in Six Days *(As Captured by the Yahoo 100 Million Image Dataset) , 2015, CVPR 2015.

[32]  Diego Klabjan,et al.  Generative Adversarial Nets for Multiple Text Corpora , 2017, 2021 International Joint Conference on Neural Networks (IJCNN).