Depth kernel descriptors for object recognition

Consumer depth cameras, such as the Microsoft Kinect, are capable of providing frames of dense depth values at real time. One fundamental question in utilizing depth cameras is how to best extract features from depth frames. Motivated by local descriptors on images, in particular kernel descriptors, we develop a set of kernel features on depth images that model size, 3D shape, and depth edges in a single framework. Through extensive experiments on object recognition, we show that (1) our local features capture different aspects of cues from a depth frame/view that complement one another; (2) our kernel features significantly outperform traditional 3D features (e.g. Spin images); and (3) we significantly improve the capabilities of depth and RGB-D (color+depth) recognition, achieving 10–15% improvement in accuracy over the state of the art.

[1]  Dieter Fox,et al.  Object recognition with hierarchical kernel descriptors , 2011, CVPR 2011.

[2]  G LoweDavid,et al.  Distinctive Image Features from Scale-Invariant Keypoints , 2004 .

[3]  Andrew Y. Ng,et al.  The Importance of Encoding Versus Training with Sparse Coding and Vector Quantization , 2011, ICML.

[4]  Yihong Gong,et al.  Locality-constrained Linear Coding for image classification , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[5]  Honglak Lee,et al.  Convolutional deep belief networks for scalable unsupervised learning of hierarchical representations , 2009, ICML '09.

[6]  Nico Blodow,et al.  Fast Point Feature Histograms (FPFH) for 3D registration , 2009, 2009 IEEE International Conference on Robotics and Automation.

[7]  Yoram Singer,et al.  Pegasos: primal estimated sub-gradient solver for SVM , 2011, Math. Program..

[8]  Cristian Sminchisescu,et al.  Efficient Match Kernel between Sets of Features for Visual Recognition , 2009, NIPS.

[9]  Cordelia Schmid,et al.  Beyond Bags of Features: Spatial Pyramid Matching for Recognizing Natural Scene Categories , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[10]  Yoshua Bengio,et al.  Gradient-based learning applied to document recognition , 1998, Proc. IEEE.

[11]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[12]  Li Fei-Fei,et al.  ImageNet: A large-scale hierarchical image database , 2009, CVPR.

[13]  Andrew E. Johnson,et al.  Using Spin Images for Efficient Object Recognition in Cluttered 3D Scenes , 1999, IEEE Trans. Pattern Anal. Mach. Intell..

[14]  Wolfram Burgard,et al.  Point feature extraction on 3D range scans taking into account object boundaries , 2011, 2011 IEEE International Conference on Robotics and Automation.

[15]  Yee Whye Teh,et al.  A Fast Learning Algorithm for Deep Belief Nets , 2006, Neural Computation.

[16]  Chih-Jen Lin,et al.  LIBSVM: A library for support vector machines , 2011, TIST.

[17]  Dieter Fox,et al.  Kernel Descriptors for Visual Recognition , 2010, NIPS.

[18]  Ming Yang,et al.  Large-scale image classification: Fast feature extraction and SVM training , 2011, CVPR 2011.

[19]  Nassir Navab,et al.  Model globally, match locally: Efficient and robust 3D object recognition , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[20]  Dieter Fox,et al.  Object Recognition in 3D Point Clouds Using Web Data and Domain Adaptation , 2010, Int. J. Robotics Res..

[21]  Toby Sharp,et al.  Real-time human pose recognition in parts from single depth images , 2011, CVPR.

[22]  Wolfram Burgard,et al.  Unsupervised discovery of object classes from range data using latent Dirichlet allocation , 2009, Robotics: Science and Systems.

[23]  Dieter Fox,et al.  A large-scale hierarchical multi-view RGB-D object dataset , 2011, 2011 IEEE International Conference on Robotics and Automation.

[24]  Chih-Jen Lin,et al.  LIBLINEAR: A Library for Large Linear Classification , 2008, J. Mach. Learn. Res..