Analysis of Compact Features for RGB-D Visual Search

Anticipating the oncoming integration of depth sensing into mobile devices, we experimentally compare different compact features for representing RGB-D images in mobile visual search. Experiments on 3 state-of-the-art datasets, addressing both category and instance recognition, show how Deep Features provided by Convolutional Neural Networks better represent appearance information, whereas shape is more effectively encoded through Kernel Descriptors. Moreover, our evaluation suggests that learning to weight the relative contribution of depth and appearance is key to deploy effectively depth sensing in forthcoming mobile visual search scenarios.

[1]  Dieter Fox,et al.  A large-scale hierarchical multi-view RGB-D object dataset , 2011, 2011 IEEE International Conference on Robotics and Automation.

[2]  Florent Perronnin,et al.  Fisher Kernels on Visual Vocabularies for Image Categorization , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[3]  Zhe Wang,et al.  Multi-Probe LSH: Efficient Indexing for High-Dimensional Similarity Search , 2007, VLDB.

[4]  David G. Lowe,et al.  Distinctive Image Features from Scale-Invariant Keypoints , 2004, International Journal of Computer Vision.

[5]  Bernd Girod,et al.  Feature Matching Performance of Compact Descriptors for Visual Search , 2014, 2014 Data Compression Conference.

[6]  Wen Gao,et al.  Location Discriminative Vocabulary Coding for Mobile Landmark Search , 2011, International Journal of Computer Vision.

[7]  Shree K. Nayar,et al.  PiCam , 2013, ACM Trans. Graph..

[8]  William Robson Schwartz,et al.  BRAND: A robust appearance and depth descriptor for RGB-D images , 2012, 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[9]  Vincent Lepetit,et al.  BRIEF: Binary Robust Independent Elementary Features , 2010, ECCV.

[10]  Jitendra Malik,et al.  Learning Rich Features from RGB-D Images for Object Detection and Segmentation , 2014, ECCV.

[11]  Bernd Girod,et al.  Compressed Histogram of Gradients: A Low-Bitrate Descriptor , 2011, International Journal of Computer Vision.

[12]  Federico Tombari,et al.  Toward Compressed 3D Descriptors , 2012, 2012 Second International Conference on 3D Imaging, Modeling, Processing, Visualization & Transmission.

[13]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[14]  Pieter Abbeel,et al.  BigBIRD: A large-scale 3D database of object instances , 2014, 2014 IEEE International Conference on Robotics and Automation (ICRA).

[15]  Matthew Johnson,et al.  Generalized Descriptor Compression for Storage and Matching , 2010, BMVC.

[16]  Dieter Fox,et al.  Depth kernel descriptors for object recognition , 2011, 2011 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[17]  Bernd Girod,et al.  Mobile Visual Search , 2011, IEEE Signal Processing Magazine.

[18]  Heinrich H. Bülthoff,et al.  Going into depth: Evaluating 2D and 3D cues for object classification on a new, large-scale object dataset , 2011, 2011 IEEE International Conference on Computer Vision Workshops (ICCV Workshops).

[19]  Sahibsingh A. Dudani The Distance-Weighted k-Nearest-Neighbor Rule , 1976, IEEE Transactions on Systems, Man, and Cybernetics.

[20]  Dieter Fox,et al.  Kernel Descriptors for Visual Recognition , 2010, NIPS.

[21]  Shih-Fu Chang,et al.  Spherical hashing , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.