Recognizing in the depth: Selective 3D Spatial Pyramid Matching Kernel for object and scene categorization

This paper proposes a novel approach to recognize object and scene categories in depth images. We introduce a Bag of Words (BoW) representation in 3D, the Selective 3D Spatial Pyramid Matching Kernel (3DSPMK). It starts quantizing 3D local descriptors, computed from point clouds, to build a vocabulary of 3D visual words. This codebook is used to build the 3DSPMK, which starts partitioning a working volume into fine sub-volumes, and computing a hierarchical weighted sum of histogram intersections of visual words at each level of the 3D pyramid structure. With the aim of increasing both the classification accuracy and the computational efficiency of the kernel, we propose two selective hierarchical volume decomposition strategies, based on representative and discriminative sub-volume selection processes, which drastically reduce the pyramid to consider. Results on different RGBD datasets show that our approaches obtain state-of-the-art results for both object recognition and scene categorization. Display Omitted We introduce the 3DSPMK for object and scene recognition in depth images.Our model repeatedly subdivides a cube inscribed in the point cloud.Then, a weighted sum of histogram of visual word occurrences is computed.Results on publicly available benchmarks have been reported.

[1]  Antonio Torralba,et al.  Modeling the Shape of the Scene: A Holistic Representation of the Spatial Envelope , 2001, International Journal of Computer Vision.

[2]  Chih-Jen Lin,et al.  LIBSVM: A library for support vector machines , 2011, TIST.

[3]  Luc Van Gool,et al.  Hough Transform and 3D SURF for Robust Three Dimensional Classification , 2010, ECCV.

[4]  Yiannis S. Boutalis,et al.  Co.Vi.Wo.: Color Visual Words Based on Non-Predefined Size Codebooks , 2013, IEEE Transactions on Cybernetics.

[5]  Robert C. Bolles,et al.  Random sample consensus: a paradigm for model fitting with applications to image analysis and automated cartography , 1981, CACM.

[6]  David G. Lowe,et al.  Object recognition from local scale-invariant features , 1999, Proceedings of the Seventh IEEE International Conference on Computer Vision.

[7]  Andrew Zisserman,et al.  Representing shape with a spatial pyramid kernel , 2007, CIVR '07.

[8]  Trevor Darrell,et al.  The pyramid match kernel: discriminative classification with sets of image features , 2005, Tenth IEEE International Conference on Computer Vision (ICCV'05) Volume 1.

[9]  Nathan Silberman,et al.  Indoor scene segmentation using a structured light sensor , 2011, 2011 IEEE International Conference on Computer Vision Workshops (ICCV Workshops).

[10]  Krista A. Ehinger,et al.  Recognizing scene viewpoint using panoramic place representation , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[11]  Andrew E. Johnson,et al.  Using Spin Images for Efficient Object Recognition in Cluttered 3D Scenes , 1999, IEEE Trans. Pattern Anal. Mach. Intell..

[12]  Bernard Chazelle,et al.  Shape distributions , 2002, TOGS.

[13]  Cordelia Schmid,et al.  Local Features and Kernels for Classification of Texture and Object Categories: A Comprehensive Study , 2006, 2006 Conference on Computer Vision and Pattern Recognition Workshop (CVPRW'06).

[14]  Dietmar Saupe,et al.  3D Model Retrieval with Spherical Harmonics and Moments , 2001, DAGM-Symposium.

[15]  Krista A. Ehinger,et al.  SUN database: Large-scale scene recognition from abbey to zoo , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[16]  Gabriela Csurka,et al.  Visual categorization with bags of keypoints , 2002, eccv 2004.

[17]  Jitendra Malik,et al.  Recognizing Objects in Range Data Using Regional Point Descriptors , 2004, ECCV.

[18]  Andrew Zisserman,et al.  Video Google: a text retrieval approach to object matching in videos , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[19]  Ko Nishino,et al.  Scale-Dependent/Invariant Local 3D Shape Descriptors for Fully Automatic Registration of Multiple Sets of Range Images , 2008, ECCV.

[20]  Luc Van Gool,et al.  SURF: Speeded Up Robust Features , 2006, ECCV.

[21]  Saturnino Maldonado-Bascón,et al.  SURFing the point clouds: Selective 3D spatial pyramids for category-level object recognition , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[22]  Cordelia Schmid,et al.  Beyond Bags of Features: Spatial Pyramid Matching for Recognizing Natural Scene Categories , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[23]  Silvio Savarese,et al.  Video scene categorization by 3D hierarchical histogram matching , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[24]  G. G. Stokes "J." , 1890, The New Yale Book of Quotations.

[25]  Chih-Jen Lin,et al.  A Comparison of Methods for Multi-class Support Vector Machines , 2015 .

[26]  Dieter Fox,et al.  Sparse distance learning for object recognition combining RGB and depth information , 2011, 2011 IEEE International Conference on Robotics and Automation.

[27]  Chih-Jen Lin,et al.  A comparison of methods for multiclass support vector machines , 2002, IEEE Trans. Neural Networks.

[28]  Andrea Fusiello,et al.  A Bag of Words Approach for 3D Object Categorization , 2009, MIRAGE.

[29]  Koen E. A. van de Sande,et al.  Evaluating Color Descriptors for Object and Scene Recognition , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[30]  Luc Van Gool,et al.  Speeded-Up Robust Features (SURF) , 2008, Comput. Vis. Image Underst..

[31]  Dan Levi,et al.  Fusing visual and range imaging for object class recognition , 2011, 2011 International Conference on Computer Vision.

[32]  Frédéric Jurie,et al.  Visual word disambiguation by semantic contexts , 2011, 2011 International Conference on Computer Vision.

[33]  Luc Van Gool,et al.  The Pascal Visual Object Classes (VOC) Challenge , 2010, International Journal of Computer Vision.

[34]  Pietro Perona,et al.  A Bayesian hierarchical model for learning natural scene categories , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[35]  Mohammed Bennamoun,et al.  Three-Dimensional Model-Based Object Recognition and Segmentation in Cluttered Scenes , 2006, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[36]  Zoltan-Csaba Marton,et al.  On Fast Surface Reconstruction Methods for Large and Noisy Datasets , 2009, IEEE International Conference on Robotics and Automation.

[37]  Zoltan-Csaba Marton,et al.  On fast surface reconstruction methods for large and noisy point clouds , 2009, 2009 IEEE International Conference on Robotics and Automation.

[38]  Dieter Fox,et al.  A large-scale hierarchical multi-view RGB-D object dataset , 2011, 2011 IEEE International Conference on Robotics and Automation.