Dual-level Embedding Alignment Network for 2D Image-Based 3D Object Retrieval

Recent advances in 3D modeling software and 3D capture devices contribute to the availability of large-scale 3D objects. However, manually labelled large-scale 3D object dataset is still too expensive to build in practice. An intuitive idea is to transfer the knowledge from label-rich 2D images (source domain) to unlabelled 3D objects (target domain) to facilitate 3D big data management. In this paper, we propose an unsupervised dual-level embedding alignment (DLEA) network for a new task, 2D image-based 3D object retrieval. It mainly consists of two modules, visual feature learning and cross-domain feature adaptation, for jointly optimizing. The first module transforms individual 3D object into a set of multi-view images and utilizes 2D CNNs to extract visual features of both multi-view image sets and the source 2D images. For multi-view fusion by reducing the distribution divergence between both domains, we propose a cross-domain view-wise attention mechanism to adaptively compute the weights of individual views and aggregate them into a compact descriptor to narrow the gap between source and target domains. With the visual representation of both domains, the module of cross-domain feature adaptation aims to enforce the domain-level and class-level embedding alignment of cross-domain feature spaces. For domain-level embedding alignment, we train a discriminator to align the global distribution statistics of both spaces. For class-level embedding alignment, we map the features in the same class but from different domains nearby through aligning the centroid of each class from both domains. To our knowledge, this is the first unsupervised work to jointly realize cross-domain feature learning and distribution alignment in an end-to-end manner for this new task. Moreover, we constructed two new datasets, MI3DOR and MI3DOR-2, to advocate the research on this topic. Extensive comparison experiments can demonstrate the superiority of DLEA against the state-of-art methods.

[1]  Junwei Han,et al.  SeqViews2SeqLabels: Learning 3D Global Features via Aggregating Sequential Views by RNN With Attention , 2019, IEEE Transactions on Image Processing.

[2]  Nicolas Courty,et al.  Joint distribution optimal transportation for domain adaptation , 2017, NIPS.

[3]  Subhransu Maji,et al.  Multi-view Convolutional Neural Networks for 3D Shape Recognition , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[4]  Wenhui Li,et al.  Hierarchical Graph Structure Learning for Multi-View 3D Model Retrieval , 2018, IJCAI.

[5]  Song Bai,et al.  Triplet-Center Loss for Multi-view 3D Object Retrieval , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[6]  Wenhui Li,et al.  Cross-Domain 3D Model Retrieval via Visual Domain Adaption , 2018, IJCAI.

[7]  Jiajun Wu,et al.  Learning a Probabilistic Latent Space of Object Shapes via 3D Generative-Adversarial Modeling , 2016, NIPS.

[8]  Michael I. Jordan,et al.  Conditional Adversarial Domain Adaptation , 2017, NeurIPS.

[9]  Jiaxin Li,et al.  SO-Net: Self-Organizing Network for Point Cloud Analysis , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[10]  Geoffrey E. Hinton,et al.  Visualizing Data using t-SNE , 2008 .

[11]  Yasuyuki Matsushita,et al.  RotationNet: Joint Object Categorization and Pose Estimation Using Multiviews from Unsupervised Viewpoints , 2016, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[12]  Yue Gao,et al.  PVNet: A Joint Convolutional Network of Point Cloud and Multi-View for 3D Shape Recognition , 2018, ACM Multimedia.

[13]  Zi Huang,et al.  Exploring Consistent Preferences: Discrete Hashing with Pair-Exemplar for Scalable Landmark Search , 2017, ACM Multimedia.

[14]  Fei-Fei Li,et al.  Label Efficient Learning of Transferable Representations acrosss Domains and Tasks , 2017, NIPS.

[15]  Yoshua Bengio,et al.  Generative Adversarial Nets , 2014, NIPS.

[16]  Ming Yang,et al.  Conditional Generative Adversarial Network for Structured Domain Adaptation , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[17]  Jing Zhang,et al.  Joint Geometrical and Statistical Alignment for Visual Domain Adaptation , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[18]  Taesung Park,et al.  CyCADA: Cycle-Consistent Adversarial Domain Adaptation , 2017, ICML.

[19]  Winston H. Hsu,et al.  Cross-Domain Image-Based 3D Shape Retrieval by View Sequence Learning , 2018, 2018 International Conference on 3D Vision (3DV).

[20]  Philip S. Yu,et al.  Visual Domain Adaptation with Manifold Embedded Distribution Alignment , 2018, ACM Multimedia.

[21]  Ivor W. Tsang,et al.  Domain Adaptation via Transfer Component Analysis , 2009, IEEE Transactions on Neural Networks.

[22]  Mohan S. Kankanhalli,et al.  MMALFM , 2018, ACM Trans. Inf. Syst..

[23]  Thomas A. Funkhouser,et al.  The Princeton Shape Benchmark , 2004, Proceedings Shape Modeling Applications, 2004..

[24]  Bui Tuong Phong Illumination for computer generated pictures , 1975, Commun. ACM.

[25]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[26]  Ming Ouhyoung,et al.  On Visual Similarity Based 3D Model Retrieval , 2003, Comput. Graph. Forum.

[27]  Bernhard Schölkopf,et al.  Domain Adaptation with Conditional Transferable Components , 2016, ICML.

[28]  Yue Gao,et al.  GVCNN: Group-View Convolutional Neural Networks for 3D Shape Recognition , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[29]  Victor S. Lempitsky,et al.  Unsupervised Domain Adaptation by Backpropagation , 2014, ICML.

[30]  Jian Shen,et al.  Wasserstein Distance Guided Representation Learning for Domain Adaptation , 2017, AAAI.

[31]  Motoaki Kawanabe,et al.  Direct Importance Estimation with Model Selection and Its Application to Covariate Shift Adaptation , 2007, NIPS.

[32]  Michael I. Jordan,et al.  Deep Transfer Learning with Joint Adaptation Networks , 2016, ICML.

[33]  Quinn Jones,et al.  Few-Shot Adversarial Domain Adaptation , 2017, NIPS.

[34]  Leonidas J. Guibas,et al.  PointNet: Deep Learning on Point Sets for 3D Classification and Segmentation , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[35]  Leonidas J. Guibas,et al.  ShapeNet: An Information-Rich 3D Model Repository , 2015, ArXiv.

[36]  Yu-Ting Su,et al.  View-Based 3-D Model Retrieval: A Benchmark , 2018, IEEE Transactions on Cybernetics.

[37]  Jianxiong Xiao,et al.  3D ShapeNets: A deep representation for volumetric shapes , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[38]  Philip S. Yu,et al.  Transfer Feature Learning with Joint Distribution Adaptation , 2013, 2013 IEEE International Conference on Computer Vision.

[39]  Sebastian Scherer,et al.  VoxNet: A 3D Convolutional Neural Network for real-time object recognition , 2015, 2015 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).