3D shape retrieval using a single depth image from low-cost sensors

Content-based 3D shape retrieval is an important problem in computer vision. Traditional retrieval interfaces require a 2D sketch or a manually designed 3D model as the query, which is difficult to specify and thus not practical in real applications. With the recent advance in low-cost 3D sensors such as Microsoft Kinect and Intel Realsense, capturing depth images that carry 3D information is fairly simple, making shape retrieval more practical and user-friendly. In this paper, we study the problem of cross-domain 3D shape retrieval using a single depth image from low-cost sensors as the query to search for similar human designed CAD models. We propose a novel method using an ensemble of autoencoders in which each autoencoder is trained to learn a compressed representation of depth views synthesize d from each database object. By viewing each autoencoder as a probabilistic model, a likelihood score can be derived as a similarity measure. A domain adaptation layer is built on top of autoencoder outputs to explicitly address the cross-domain issue (between noisy sensory data and clean 3D models) by incorporating training data of sensor depth images and their category labels in a weakly supennsed learning formulation. Experiments using real-world depth images and a large-scale CAD dataset demonstrate the effectiveness of our approach, which offers significant improvements over state-of-the-art 3D shape retrieval methods.

[1]  Zhang Xiong,et al.  3D object retrieval with stacked local convolutional autoencoder , 2015, Signal Process..

[2]  Andrew W. Fitzgibbon,et al.  KinectFusion: real-time 3D reconstruction and interaction using a moving depth camera , 2011, UIST.

[3]  Roland Memisevic,et al.  On autoencoder scoring , 2013, ICML.

[4]  Tobias Schreck,et al.  SHREC'13 Track: Large-Scale Partial Shape Retrieval Using Simulated Range Images , 2013, 3DOR@Eurographics.

[5]  Geoffrey E. Hinton,et al.  Modeling the joint density of two images under a variety of transformations , 2011, CVPR 2011.

[6]  Xiaogang Wang,et al.  Hierarchical face parsing via deep learning , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[7]  Ming-Hsuan Yang,et al.  Visual tracking with online Multiple Instance Learning , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[8]  Thomas A. Funkhouser,et al.  The Princeton Shape Benchmark , 2004, Proceedings Shape Modeling Applications, 2004..

[9]  Marc'Aurelio Ranzato,et al.  Building high-level features using large scale unsupervised learning , 2011, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[10]  Vladlen Koltun,et al.  Geodesic Object Proposals , 2014, ECCV.

[11]  Ryutarou Ohbuchi,et al.  SHREC'12 Track: Generic 3D Shape Retrieval , 2012, 3DOR@Eurographics.

[12]  Alexei A. Efros,et al.  Ensemble of exemplar-SVMs for object detection and beyond , 2011, 2011 International Conference on Computer Vision.

[13]  R. Horaud,et al.  Surface feature detection and description with applications to mesh matching , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[14]  Dieter Fox,et al.  A large-scale hierarchical multi-view RGB-D object dataset , 2011, 2011 IEEE International Conference on Robotics and Automation.

[15]  Ming Ouhyoung,et al.  On Visual Similarity Based 3D Model Retrieval , 2003, Comput. Graph. Forum.

[16]  Yoshua Bengio,et al.  What regularized auto-encoders learn from the data-generating distribution , 2012, J. Mach. Learn. Res..

[17]  Quoc V. Le,et al.  Learning hierarchical invariant spatio-temporal features for action recognition with independent subspace analysis , 2011, CVPR 2011.

[18]  Jun Wang,et al.  From Low-Cost Depth Sensors to CAD: Cross-Domain 3D Shape Retrieval via Regression Tree Fields , 2014, ECCV.

[19]  Yoshua Bengio,et al.  Extracting and composing robust features with denoising autoencoders , 2008, ICML '08.

[20]  Andrew E. Johnson,et al.  Using Spin Images for Efficient Object Recognition in Cluttered 3D Scenes , 1999, IEEE Trans. Pattern Anal. Mach. Intell..

[21]  Petros Daras,et al.  SHREC'09 Track: Querying with Partial Models , 2009, 3DOR@Eurographics.

[22]  Leonidas J. Guibas,et al.  One Point Isometric Matching with the Heat Kernel , 2010, Comput. Graph. Forum.

[23]  Jian Sun,et al.  Salient object detection by composition , 2011, 2011 International Conference on Computer Vision.

[24]  Rongrong Ji,et al.  Label Propagation from ImageNet to 3D Point Clouds , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[25]  Pascal Vincent,et al.  Contractive Auto-Encoders: Explicit Invariance During Feature Extraction , 2011, ICML.

[26]  Derek Hoiem,et al.  Indoor Segmentation and Support Inference from RGBD Images , 2012, ECCV.

[27]  Stan Sclaroff,et al.  Improved feature descriptors for 3D surface matching , 2007, SPIE Optics East.

[28]  Jianxiong Xiao,et al.  Sliding Shapes for 3D Object Detection in Depth Images , 2014, ECCV.

[29]  Dieter Fox,et al.  Depth kernel descriptors for object recognition , 2011, 2011 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[30]  Petros Daras,et al.  A Compact Multi-view Descriptor for 3D Object Retrieval , 2009, 2009 Seventh International Workshop on Content-Based Multimedia Indexing.