Joint analysis of shapes and images via deep domain adaptation

Abstract 3D shapes and 2D images usually contain complementary information for each other, and thus joint analysis of both of them will benefit some problems existing in different domains. Leveraging the connection between 2D images and 3D shapes, it's potential to mine lacking information of one modal from the other. Stemming from this insight, we design and implement a CNN architecture to jointly analyze shapes and images even with few training data guidance. The core of our architecture is a domain adaptation algorithm, which builds up the connection between underlying feature spaces of images and shapes, then aligns and correlates the intrinsic structures therein. The proposed method facilitates the recognition and retrieval tasks. Experiments on the shape recognition tasks show that our approach has superior performance under the difficult setting: zero-shot learning and few-shot learning. We also evaluate our method on the retrieval tasks, and demonstrate the effectiveness of the proposed method.

[1]  Ligang Liu,et al.  Upright orientation of 3D shapes with Convolutional Networks , 2016, Graph. Model..

[2]  Song Bai,et al.  Deep learning representation using autoencoder for 3D shape retrieval , 2014, Proceedings 2014 IEEE International Conference on Security, Pattern Analysis, and Cybernetics (SPAC).

[3]  Koby Crammer,et al.  A theory of learning from different domains , 2010, Machine Learning.

[4]  Qiang Yang,et al.  A Survey on Transfer Learning , 2010, IEEE Transactions on Knowledge and Data Engineering.

[5]  Jianxiong Xiao,et al.  3D ShapeNets: A deep representation for volumetric shapes , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[6]  Martín Abadi,et al.  TensorFlow: Large-Scale Machine Learning on Heterogeneous Distributed Systems , 2016, ArXiv.

[7]  Marcel Körtgen,et al.  3D Shape Matching with 3D Shape Contexts , 2003 .

[8]  Pascal Vincent,et al.  Unsupervised Feature Learning and Deep Learning: A Review and New Perspectives , 2012, ArXiv.

[9]  Subhransu Maji,et al.  Multi-view Convolutional Neural Networks for 3D Shape Recognition , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[10]  Trevor Darrell,et al.  DeCAF: A Deep Convolutional Activation Feature for Generic Visual Recognition , 2013, ICML.

[11]  J. Kruskal Multidimensional scaling by optimizing goodness of fit to a nonmetric hypothesis , 1964 .

[12]  Benjamin B. Kimia,et al.  3D Object Recognition Using Shape Similarity-Based Aspect Graph , 2001, ICCV.

[13]  Andrew Y. Ng,et al.  Zero-Shot Learning Through Cross-Modal Transfer , 2013, NIPS.

[14]  Marc'Aurelio Ranzato,et al.  DeViSE: A Deep Visual-Semantic Embedding Model , 2013, NIPS.

[15]  Jeffrey Dean,et al.  Efficient Estimation of Word Representations in Vector Space , 2013, ICLR.

[16]  Rich Caruana,et al.  Do Deep Nets Really Need to be Deep? , 2013, NIPS.

[17]  Hans-Peter Seidel,et al.  LeSSS: Learned Shared Semantic Spaces for Relating Multi‐Modal Representations of 3D Shapes , 2015, SGP '15.

[18]  Leonidas J. Guibas,et al.  A concise and provably informative multi-scale signature based on heat diffusion , 2009 .

[19]  Xiaogang Wang,et al.  Deep Learning Face Representation from Predicting 10,000 Classes , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[20]  Kenji Fukumizu,et al.  Equivalence of distance-based and RKHS-based statistics in hypothesis testing , 2012, ArXiv.

[21]  Maks Ovsjanikov,et al.  CrossLink: joint understanding of image and 3D model collections through shape and camera pose variations , 2015, ACM Trans. Graph..

[22]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[23]  Bernard Chazelle,et al.  Shape distributions , 2002, TOGS.

[24]  Craig Gotsman,et al.  Characterizing Shape Using Conformal Factors , 2008, 3DOR@Eurographics.

[25]  Dumitru Erhan,et al.  Going deeper with convolutions , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[26]  Ting Liu,et al.  Recent advances in convolutional neural networks , 2015, Pattern Recognit..

[27]  Szymon Rusinkiewicz,et al.  Rotation Invariant Spherical Harmonic Representation of 3D Shape Descriptors , 2003, Symposium on Geometry Processing.

[28]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[29]  Trevor Darrell,et al.  Caffe: Convolutional Architecture for Fast Feature Embedding , 2014, ACM Multimedia.

[30]  Ming Ouhyoung,et al.  On Visual Similarity Based 3D Model Retrieval , 2003, Comput. Graph. Forum.

[31]  Jitendra Malik,et al.  Shape matching and object recognition using shape contexts , 2010, 2010 3rd International Conference on Computer Science and Information Technology.

[32]  Yoshua Bengio,et al.  Zero-data Learning of New Tasks , 2008, AAAI.

[33]  Leonidas J. Guibas,et al.  Joint embeddings of shapes and images via CNN image purification , 2015, ACM Trans. Graph..

[34]  Marcel Körtgen,et al.  3 D Shape Matching with 3 D Shape Contexts , 2003 .

[35]  Yoshua Bengio,et al.  Domain Adaptation for Large-Scale Sentiment Classification: A Deep Learning Approach , 2011, ICML.

[36]  Longin Jan Latecki,et al.  GIFT: A Real-Time and Scalable 3D Shape Search Engine , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[37]  Luc Van Gool,et al.  Hough Transform and 3D SURF for Robust Three Dimensional Classification , 2010, ECCV.

[38]  Neil A. Dodgson,et al.  Shape2Vec: semantic-based descriptors for 3D shapes, sketches and images , 2016, ACM Trans. Graph..

[39]  Trevor Darrell,et al.  Simultaneous Deep Transfer Across Domains and Tasks , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[40]  Bui Tuong Phong Illumination for computer generated pictures , 1975, Commun. ACM.

[41]  B. Kimia,et al.  3D object recognition using shape similiarity-based aspect graph , 2001, Proceedings Eighth IEEE International Conference on Computer Vision. ICCV 2001.

[42]  Juhan Nam,et al.  Multimodal Deep Learning , 2011, ICML.

[43]  Paul A. Viola,et al.  Learning from one example through shared densities on transforms , 2000, Proceedings IEEE Conference on Computer Vision and Pattern Recognition. CVPR 2000 (Cat. No.PR00662).

[44]  Daniel Cohen-Or,et al.  Contextual Part Analogies in 3D Objects , 2010, International Journal of Computer Vision.

[45]  Yoshua Bengio,et al.  How transferable are features in deep neural networks? , 2014, NIPS.