Semantics-enhanced supervised deep autoencoder for depth image-based 3D model retrieval

Increased accuracy and affordability of depth sensors such as Kinect has created a great depth-data source for various 3D oriented applications. Specifically, 3D model retrieval is attracting attention in the field of computer vision and pattern recognition due to its numerous applications. A cross-domain retrieval approach such as depth image based 3D model retrieval has the challenges of occlusion, noise and view variability present in both query and training data. In this paper, we propose a new supervised deep autoencoder approach followed by semantic modeling to retrieve 3D shapes based on depth images. The key novelty is the two-fold feature abstraction to cope with the incompleteness and ambiguity present in the depth images. First, we develop a supervised autoencoder to extract robust features from both real depth images and synthetic ones rendered from 3D models, which are intended to balance reconstruction and classification capabilities of mix-domain data. Then semantic modeling of the supervised autoencoder features offers the next level of abstraction to cope with the incompleteness and ambiguity of the depth data. It is interesting that unlike any other pairwise model structures, we argue that cross-domain retrieval is still possible using only one single deep network trained on real and synthetic data. The experimental results on the NYUD2 and ModelNet10 datasets demonstrate that the proposed supervised method outperforms the recent approaches for cross-modal 3D model retrieval.

[1]  Jing Zhu,et al.  Learning Pairwise Neural Network Encoder for Depth Image-based 3D Model Retrieval , 2015, ACM Multimedia.

[2]  Marc'Aurelio Ranzato,et al.  Semi-supervised learning of compact document representations with deep networks , 2008, ICML '08.

[3]  Jeffrey Pennington,et al.  Semi-Supervised Recursive Autoencoders for Predicting Sentiment Distributions , 2011, EMNLP.

[4]  Nuno Vasconcelos,et al.  Bridging the Gap: Query by Semantic Example , 2007, IEEE Transactions on Multimedia.

[5]  Chang Liu,et al.  Adaptive Deep Supervised Autoencoder Based Image Reconstruction for Face Recognition , 2016 .

[6]  Fang Wang,et al.  Sketch-based 3D shape retrieval using Convolutional Neural Networks , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[7]  Yihong Gong,et al.  Linear spatial pyramid matching using sparse coding for image classification , 2009, CVPR.

[8]  Angshul Majumdar,et al.  Semi Supervised Autoencoder , 2016, ICONIP.

[9]  Honglak Lee,et al.  Convolutional deep belief networks for scalable unsupervised learning of hierarchical representations , 2009, ICML '09.

[10]  Roger Levy,et al.  A new approach to cross-modal multimedia retrieval , 2010, ACM Multimedia.

[11]  Yosi Keller,et al.  Scale-Invariant Features for 3-D Mesh Models , 2012, IEEE Transactions on Image Processing.

[12]  Jianxiong Xiao,et al.  3D ShapeNets: A deep representation for volumetric shapes , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[13]  Kai-Lung Hua,et al.  3D model retrieval based on deep Autoencoder neural networks , 2017, 2017 International Conference on Signals and Systems (ICSigSys).

[14]  Ying Tan,et al.  Variational Autoencoder for Semi-Supervised Text Classification , 2017, AAAI.

[15]  Hui Xiong,et al.  Representation Learning via Semi-Supervised Autoencoder for Multi-task Learning , 2015, 2015 IEEE International Conference on Data Mining.

[16]  Thomas A. Funkhouser,et al.  The Princeton Shape Benchmark , 2004, Proceedings Shape Modeling Applications, 2004..

[17]  Guoliang Fan,et al.  Supervised Deep-Autoencoder for Depth Image-Based 3D Model Retrieval , 2018, 2018 IEEE Winter Conference on Applications of Computer Vision (WACV).

[18]  Pascal Vincent,et al.  Contractive Auto-Encoders: Explicit Invariance During Feature Extraction , 2011, ICML.

[19]  Song Bai,et al.  Deep learning representation using autoencoder for 3D shape retrieval , 2014, Proceedings 2014 IEEE International Conference on Security, Pattern Analysis, and Cybernetics (SPAC).

[20]  Li Liao,et al.  Deep Neural Networks with Parallel Autoencoders for Learning Pairwise Relations: Handwritten Digits Subtraction , 2015, 2015 IEEE 14th International Conference on Machine Learning and Applications (ICMLA).

[21]  Jianxiong Xiao,et al.  Sliding Shapes for 3D Object Detection in Depth Images , 2014, ECCV.

[22]  Derek Hoiem,et al.  Indoor Segmentation and Support Inference from RGBD Images , 2012, ECCV.