Spherical Blurred Shape Model for 3-D Object and Pose Recognition: Quantitative Analysis and HCI Applications in Smart Environments

The use of depth maps is of increasing interest after the advent of cheap multisensor devices based on structured light, such as Kinect. In this context, there is a strong need of powerful 3-D shape descriptors able to generate rich object representations. Although several 3-D descriptors have been already proposed in the literature, the research of discriminative and computationally efficient descriptors is still an open issue. In this paper, we propose a novel point cloud descriptor called spherical blurred shape model (SBSM) that successfully encodes the structure density and local variabilities of an object based on shape voxel distances and a neighborhood propagation strategy. The proposed SBSM is proven to be rotation and scale invariant, robust to noise and occlusions, highly discriminative for multiple categories of complex objects like the human hand, and computationally efficient since the SBSM complexity is linear to the number of object voxels. Experimental evaluation in public depth multiclass object data, 3-D facial expressions data, and a novel hand poses data sets show significant performance improvements in relation to state-of-the-art approaches. Moreover, the effectiveness of the proposal is also proved for object spotting in 3-D scenes and for real-time automatic hand pose recognition in human computer interaction scenarios.

[1]  Ken Shoemake,et al.  Animating rotation with quaternion curves , 1985, SIGGRAPH.

[2]  Gilles Burel,et al.  Determination of the Orientation of 3D Objects Using Spherical Harmonics , 1995, CVGIP Graph. Model. Image Process..

[3]  Hans-Peter Kriegel,et al.  3D Shape Histograms for Similarity Search and Classification in Spatial Databases , 1999, SSD.

[4]  Dietmar Saupe,et al.  3D Model Retrieval with Spherical Harmonics and Moments , 2001, DAGM-Symposium.

[5]  Jean-Yves Bouguet,et al.  Camera calibration toolbox for matlab , 2001 .

[6]  Alberto Del Bimbo,et al.  Visual capture and understanding of hand pointing actions in a 3-D environment , 2003, IEEE Trans. Syst. Man Cybern. Part B.

[7]  Szymon Rusinkiewicz,et al.  Rotation Invariant Spherical Harmonic Representation of 3D Shape Descriptors , 2003, Symposium on Geometry Processing.

[8]  Jitendra Malik,et al.  Recognizing Objects in Range Data Using Regional Point Descriptors , 2004, ECCV.

[9]  Leonidas J. Guibas,et al.  Estimating surface normals in noisy point cloud data , 2004, Int. J. Comput. Geom. Appl..

[10]  Marc Alexa,et al.  On Normals and Projection Operators for Surfaces Defined by Point Sets , 2004, PBG.

[11]  Janez Demsar,et al.  Statistical Comparisons of Classifiers over Multiple Data Sets , 2006, J. Mach. Learn. Res..

[12]  Nico Blodow,et al.  Learning informative point classes for the acquisition of object model maps , 2008, 2008 10th International Conference on Control, Automation, Robotics and Vision.

[13]  Craig Gotsman,et al.  Characterizing Shape Using Conformal Factors , 2008, 3DOR@Eurographics.

[14]  Sergio Escalera,et al.  Blurred Shape Model for binary and grey-level symbol recognition , 2009, Pattern Recognit. Lett..

[15]  Nico Blodow,et al.  Fast Point Feature Histograms (FPFH) for 3D registration , 2009, 2009 IEEE International Conference on Robotics and Automation.

[16]  Zhiliang Wang,et al.  A Design and Research of Eye Gaze Tracking System Based on Stereovision , 2009, ICIC.

[17]  Mauro R. Ruggeri,et al.  Spectral-Driven Isometry-Invariant Matching of 3D Shapes , 2010, International Journal of Computer Vision.

[18]  Luc Van Gool,et al.  Hough Transform and 3D SURF for Robust Three Dimensional Classification , 2010, ECCV.

[19]  Federico Tombari,et al.  Unique Signatures of Histograms for Local Surface Description , 2010, ECCV.

[20]  Gary R. Bradski,et al.  Fast 3D recognition and pose using the Viewpoint Feature Histogram , 2010, 2010 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[21]  Sebastian Thrun,et al.  Real time motion capture using a single time-of-flight camera , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[22]  Jitendra Malik,et al.  Shape matching and object recognition using shape contexts , 2010, 2010 3rd International Conference on Computer Science and Information Technology.

[23]  Dieter Fox,et al.  A large-scale hierarchical multi-view RGB-D object dataset , 2011, 2011 IEEE International Conference on Robotics and Automation.

[24]  Markus Vincze,et al.  Ensemble of shape functions for 3D object classification , 2011, 2011 IEEE International Conference on Robotics and Biomimetics.

[25]  Chih-Jen Lin,et al.  LIBSVM: A library for support vector machines , 2011, TIST.

[26]  Dieter Fox,et al.  Sparse distance learning for object recognition combining RGB and depth information , 2011, 2011 IEEE International Conference on Robotics and Automation.

[27]  Dieter Fox,et al.  Depth kernel descriptors for object recognition , 2011, 2011 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[28]  Sergio Escalera,et al.  Circular Blurred Shape Model for Multiclass Symbol Recognition , 2011, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).

[29]  Luís A. Alexandre 3D Descriptors for Object and Category Recognition: a Comparative Evaluation , 2012 .

[30]  Arman Savran,et al.  Regression-based intensity estimation of facial action units , 2012, Image Vis. Comput..

[31]  Dieter Fox,et al.  Unsupervised Feature Learning for RGB-D Based Object Recognition , 2012, ISER.

[32]  Saturnino Maldonado-Bascón,et al.  SURFing the point clouds: Selective 3D spatial pyramids for category-level object recognition , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[33]  Hubert P. H. Shum,et al.  Real-Time Posture Reconstruction for Microsoft Kinect , 2013, IEEE Transactions on Cybernetics.

[34]  Xuelong Li,et al.  Depth-Aware Image Seam Carving , 2013, IEEE Transactions on Cybernetics.

[35]  Li Cheng,et al.  Efficient Hand Pose Estimation from a Single Depth Image , 2013, 2013 IEEE International Conference on Computer Vision.

[36]  Sergio Escalera,et al.  Multi-modal descriptors for multi-class hand pose recognition in human computer interaction systems , 2013, ICMI '13.