Learning Pairwise Neural Network Encoder for Depth Image-based 3D Model Retrieval

With the emergence of RGB-D cameras (e.g., Kinect), the sensing capability of artificial intelligence systems has been dramatically increased, and as a consequence, a wide range of depth image-based human-machine interaction applications are proposed. In design industry, a 3D model always contains abundant information, which are required for manufacture. Since depth images can be conveniently acquired, a retrieval system that can return 3D models based on depth image inputs can assist or improve the traditional product design process. In this work, we address the depth image-based 3D model retrieval problem. By extending the neural network to a neural network pair with identical output layers for objects of the same category, unified domain-invariant representations can be learned based on the low-level mismatched depth image features and 3D model features. A unique advantage of the framework is that the correspondence information between depth images and 3D models are not required, so that it can easily be generalized to large-scale databases. In order to evaluate the effectiveness of our approach, depth images (with Kinect-type noise) in the NYU Depth V2 dataset are used as queries to retrieve 3D models of the same categories in the SHREC 2014 dataset. Experimental results suggest that our approach can outperform the state-of-the-arts methods, and the paradigm that directly uses the original representations of depth images and 3D models for retrieval.

[1]  Xindong Wu,et al.  3-D Object Retrieval With Hausdorff Distance Learning , 2014, IEEE Transactions on Industrial Electronics.

[2]  Yosi Keller,et al.  Scale-Invariant Features for 3-D Mesh Models , 2012, IEEE Transactions on Image Processing.

[3]  Ron Meir,et al.  Semantic-oriented 3d shape retrieval using relevance feedback , 2005, The Visual Computer.

[4]  Jitendra Malik,et al.  Learning Rich Features from RGB-D Images for Object Detection and Segmentation , 2014, ECCV.

[5]  Derek Hoiem,et al.  Indoor Segmentation and Support Inference from RGBD Images , 2012, ECCV.

[6]  Yi Fang,et al.  3D Laplacian Pyramid Signature , 2014, ACCV Workshops.

[7]  Meng Wang,et al.  3D deep shape descriptor , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[8]  Trevor Darrell,et al.  Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation , 2013, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[9]  Geoffrey E. Hinton,et al.  Reducing the Dimensionality of Data with Neural Networks , 2006, Science.

[10]  Ling Shao,et al.  Weakly-Supervised Cross-Domain Dictionary Learning for Visual Recognition , 2014, International Journal of Computer Vision.

[11]  Jun Wang,et al.  From Low-Cost Depth Sensors to CAD: Cross-Domain 3D Shape Retrieval via Regression Tree Fields , 2014, ECCV.

[12]  Bo Li,et al.  Extended Large Scale Sketch-Based 3D Shape Retrieval , 2014, 3DOR@Eurographics.

[13]  Geoffrey E. Hinton,et al.  Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition-' Washington , D . C . , June , 1983 OPTIMAL PERCEPTUAL INFERENCE , 2011 .

[14]  Roger Levy,et al.  A new approach to cross-modal multimedia retrieval , 2010, ACM Multimedia.

[15]  Ivor W. Tsang,et al.  Learning With Augmented Features for Supervised and Semi-Supervised Heterogeneous Domain Adaptation , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[16]  Yihong Gong,et al.  Linear spatial pyramid matching using sparse coding for image classification , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[17]  Yue Gao,et al.  3-D Object Retrieval and Recognition With Hypergraph Analysis , 2012, IEEE Transactions on Image Processing.

[18]  Marcin Novotni,et al.  3D zernike descriptors for content based shape retrieval , 2003, SM '03.

[19]  C. Lawrence Zitnick,et al.  Structured Forests for Fast Edge Detection , 2013, 2013 IEEE International Conference on Computer Vision.

[20]  Karthik Ramani,et al.  Temperature distribution descriptor for robust 3D shape retrieval , 2011, CVPR 2011 WORKSHOPS.

[21]  Edward K. Wong,et al.  Deepshape: Deep learned shape descriptor for 3D shape matching and retrieval , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).