HGAN: Holistic Generative Adversarial Networks for Two-dimensional Image-based Three-dimensional Object Retrieval

In this article, we propose a novel method to address the two-dimensional (2D) image-based 3D object retrieval problem. First, we extract a set of virtual views to represent each 3D object. Then, a soft-attention model is utilized to find the weight of each view to select one characteristic view for each 3D object. Second, we propose a novel Holistic Generative Adversarial Network (HGAN) to solve the cross-domain feature representation problem and make the feature space of virtual characteristic view more inclined to the feature space of the real picture. This will effectively mitigate the distribution discrepancies across the 2D image domains and 3D object domains. Finally, we utilize the generative model of the HGAN to obtain the “virtual real image” of each 3D object and make the characteristic view of the 3D object and real picture possess the same feature space for retrieval. To demonstrate the performance of our approach, We established a new dataset that includes pairs of 2D images and 3D objects, where the 3D objects are based on the ModelNet40 dataset. The experimental results demonstrate the superiority of our proposed method over the state-of-the-art methods.

[1]  Wojciech Zaremba,et al.  Recurrent Neural Network Regularization , 2014, ArXiv.

[2]  Alexei A. Efros,et al.  Seeing 3D Chairs: Exemplar Part-Based 2D-3D Alignment Using a Large Dataset of CAD Models , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[3]  Dumitru Erhan,et al.  Going deeper with convolutions , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[4]  Jungong Han,et al.  Cross-View Retrieval via Probability-Based Semantics-Preserving Hashing , 2017, IEEE Transactions on Cybernetics.

[5]  Yoshihisa Shinagawa,et al.  Automatic 3-D grayscale volume matching and shape analysis , 2006, IEEE Transactions on Information Technology in Biomedicine.

[6]  Yi Fang,et al.  Learning domain-invariant feature for robust depth-image-based 3D shape retrieval , 2017, Pattern Recognit. Lett..

[7]  Cedric Nishan Canagarajah,et al.  A Unified Framework for Object Retrieval and Mining , 2009, IEEE Transactions on Circuits and Systems for Video Technology.

[8]  Yang Yu,et al.  Unsupervised Representation Learning with Deep Convolutional Neural Network for Remote Sensing Images , 2017, ICIG.

[9]  Masaki Aono,et al.  A large-scale Shape Benchmark for 3D object retrieval: Toyohashi shape benchmark , 2012, Proceedings of The 2012 Asia Pacific Signal and Information Processing Association Annual Summit and Conference.

[10]  Yoshua Bengio,et al.  Show, Attend and Tell: Neural Image Caption Generation with Visual Attention , 2015, ICML.

[11]  Leonidas J. Guibas,et al.  Joint embeddings of shapes and images via CNN image purification , 2015, ACM Trans. Graph..

[12]  Simon Osindero,et al.  Conditional Generative Adversarial Nets , 2014, ArXiv.

[13]  Ioannis Pratikakis,et al.  Efficient 3D shape matching and retrieval using a concrete radialized spherical projection representation , 2007, Pattern Recognit..

[14]  Antonio Torralba,et al.  Generating Videos with Scene Dynamics , 2016, NIPS.

[15]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[16]  Zhiwen Yu,et al.  3-D Head Model Retrieval Using a Single Face View Query , 2007, IEEE Transactions on Multimedia.

[17]  Philip S. Yu,et al.  Transfer Joint Matching for Unsupervised Domain Adaptation , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[18]  Ryutarou Ohbuchi,et al.  SHREC'12 Track: Generic 3D Shape Retrieval , 2012, 3DOR@Eurographics.

[19]  An-An Liu,et al.  3D Object Retrieval Based on Multi-View Latent Variable Model , 2019, IEEE Transactions on Circuits and Systems for Video Technology.

[20]  Jing Zhang,et al.  Joint Geometrical and Statistical Alignment for Visual Domain Adaptation , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[21]  Bernhard Schölkopf,et al.  A Kernel Two-Sample Test , 2012, J. Mach. Learn. Res..

[22]  Yoshua Bengio,et al.  Generative Adversarial Nets , 2014, NIPS.

[23]  Yoshua Bengio,et al.  Neural Machine Translation by Jointly Learning to Align and Translate , 2014, ICLR.

[24]  拓海 杉山,et al.  “Unpaired Image-to-Image Translation using Cycle-Consistent Adversarial Networks”の学習報告 , 2017 .

[25]  Yizhou Wang,et al.  Quantized Correlation Hashing for Fast Cross-Modal Search , 2015, IJCAI.

[26]  David P. Dobkin,et al.  A search engine for 3D models , 2003, TOGS.

[27]  Michael S. Bernstein,et al.  ImageNet Large Scale Visual Recognition Challenge , 2014, International Journal of Computer Vision.

[28]  Junwei Han,et al.  SeqViews2SeqLabels: Learning 3D Global Features via Aggregating Sequential Views by RNN With Attention , 2019, IEEE Transactions on Image Processing.

[29]  Alexei A. Efros,et al.  Context Encoders: Feature Learning by Inpainting , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[30]  Yi Fang,et al.  Deep Correlated Holistic Metric Learning for Sketch-Based 3D Shape Retrieval , 2018, IEEE Transactions on Image Processing.

[31]  Ivor W. Tsang,et al.  Domain Adaptation via Transfer Component Analysis , 2009, IEEE Transactions on Neural Networks.

[32]  Song Bai,et al.  Triplet-Center Loss for Multi-view 3D Object Retrieval , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[33]  Abhinav Gupta,et al.  Learning a Predictable and Generative Vector Representation for Objects , 2016, ECCV.

[34]  Yi Zhen,et al.  Co-Regularized Hashing for Multimodal Data , 2012, NIPS.

[35]  Ryutarou Ohbuchi,et al.  Hashing Cross-Modal Manifold for Scalable Sketch-Based 3D Model Retrieval , 2014, 2014 2nd International Conference on 3D Vision.

[36]  Yin Zhang,et al.  Image-based 3D model retrieval using manifold learning , 2018, Frontiers of Information Technology & Electronic Engineering.

[37]  Mathieu Aubry,et al.  Understanding Deep Features with Computer-Generated Imagery , 2015, ICCV.

[38]  Bing-Yu Chen,et al.  A web-based three-dimensional protein retrieval system by matching visual similarity , 2005, Bioinform..

[39]  Kun Zhou,et al.  An interactive approach to semantic modeling of indoor scenes with an RGBD camera , 2012, ACM Trans. Graph..

[40]  Ali Shokoufandeh,et al.  Retrieving articulated 3-D models using medial surfaces , 2008, Machine Vision and Applications.

[41]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[42]  Mathieu Aubry,et al.  Deep Exemplar 2D-3D Detection by Adapting from Real to Rendered Views , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[43]  Jianxiong Xiao,et al.  3D ShapeNets: A deep representation for volumetric shapes , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[44]  Philip S. Yu,et al.  Transfer Feature Learning with Joint Distribution Adaptation , 2013, 2013 IEEE International Conference on Computer Vision.

[45]  Quoc V. Le,et al.  Sequence to Sequence Learning with Neural Networks , 2014, NIPS.

[46]  Frédéric Bosché,et al.  Automated retrieval of 3D CAD model objects in construction range images , 2008 .

[47]  Yoshua Bengio,et al.  Learning Phrase Representations using RNN Encoder–Decoder for Statistical Machine Translation , 2014, EMNLP.

[48]  Minh N. Do,et al.  2D Image-Based 3D Scene Retrieval , 2018, 3DOR@Eurographics.

[49]  Yi Fang,et al.  Learning Cross-Domain Neural Networks for Sketch-Based 3D Shape Retrieval , 2016, AAAI.

[50]  Jiajun Wu,et al.  Learning a Probabilistic Latent Space of Object Shapes via 3D Generative-Adversarial Modeling , 2016, NIPS.