Learning Typical 3D Representation from a Single 2D Correspondence Using 2D-3D Transformation Network

Understanding 3D environment based on deep learning is the subject of interest in computer vision community due to its wide variety of applications. However, learning good 3D representation is a challenging problem due to many factors including high dimensionality of the data, modeling complexity in terms of symmetric objects, required computational costs and expected variations scale up by the order of 1.5 compared to learning 2D representations. In this paper we address the problem of typical 3D representation by transformation from its corresponding single 2D representation using Deep Autoencoder based Generate Adversarial Network termed as 2D-3D Transformation Network (2D-3D-TNET). The proposed model objective is based on traditional GAN loss along with the autoencoder loss, which allows the generator to effectively generate 3D objects corresponding to the input 2D images. Furthermore, instead of training the discriminator to discriminate real from generated image, we allow the discriminator to take both 2D image and 3D object as an input and learn to discriminate whether they are true correspondences or not. Thus, the discriminator learns and encodes the relationship between 3D objects and their corresponding projected 2D images. In addition, our model does not require labelled data for training and learns on unsupervised data. Experiments are conducted on ISRI_DB with real 3D daily-life objects as well as with the standard ModelNet40 dataset. The experimental results demonstrate that our model effectively transforms 2D images to their corresponding 3D representations and has the capability of learning rich relationship among them as compared to the traditional 3D Generative Adversarial Network (3D-GAN) and Deep Autoencoder (DAE).

[1]  Subhransu Maji,et al.  Multi-view Convolutional Neural Networks for 3D Shape Recognition , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[2]  Yoshua Bengio,et al.  Generative Adversarial Nets , 2014, NIPS.

[3]  Remco C. Veltkamp,et al.  A Survey of Content Based 3D Shape Retrieval Methods , 2004, SMI.

[4]  Max Jaderberg,et al.  Unsupervised Learning of 3D Structure from Images , 2016, NIPS.

[5]  Soumith Chintala,et al.  Unsupervised Representation Learning with Deep Convolutional Generative Adversarial Networks , 2015, ICLR.

[6]  Wayne E. Carlson An algorithm and data structure for 3D object synthesis using surface patch intersections , 1982, SIGGRAPH.

[7]  Sukhan Lee,et al.  Octree-based obstacle representation and registration for real-time , 2007, ICMIT: Mechatronics and Information Technology.

[8]  Silvio Savarese,et al.  Deep View Morphing , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[9]  A. Laurentini,et al.  The Visual Hull Concept for Silhouette-Based Image Understanding , 1994, IEEE Trans. Pattern Anal. Mach. Intell..

[10]  Jianxiong Xiao,et al.  3D ShapeNets: A deep representation for volumetric shapes , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[11]  Oliver Grau,et al.  VConv-DAE: Deep Volumetric Shape Learning Without Object Labels , 2016, ECCV Workshops.

[12]  Gustavo Carneiro,et al.  Unsupervised CNN for Single View Depth Estimation: Geometry to the Rescue , 2016, ECCV.

[13]  Jiajun Wu,et al.  Learning a Probabilistic Latent Space of Object Shapes via 3D Generative-Adversarial Modeling , 2016, NIPS.

[14]  Carlos Hernandez,et al.  Multi-View Stereo: A Tutorial , 2015, Found. Trends Comput. Graph. Vis..

[15]  Leonidas J. Guibas,et al.  Volumetric and Multi-view CNNs for Object Classification on 3D Data , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[16]  Abhinav Gupta,et al.  Learning a Predictable and Generative Vector Representation for Objects , 2016, ECCV.

[17]  Ghassan Hamarneh,et al.  A Survey on Shape Correspondence , 2011, Comput. Graph. Forum.

[18]  John Flynn,et al.  Deep Stereo: Learning to Predict New Views from the World's Imagery , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[19]  Pierre Baldi,et al.  Autoencoders, Unsupervised Learning, and Deep Architectures , 2011, ICML Unsupervised and Transfer Learning.

[20]  Robert J. Woodham,et al.  Photometric method for determining surface orientation from multiple images , 1980 .

[21]  Honglak Lee,et al.  Perspective Transformer Nets: Learning Single-View 3D Object Reconstruction without 3D Supervision , 2016, NIPS.

[22]  Thomas Brox,et al.  Learning to generate chairs with convolutional neural networks , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).