Multi-modal descriptors for multi-class hand pose recognition in human computer interaction systems

Hand pose recognition in advanced Human Computer Interaction systems (HCI) is becoming more feasible thanks to the use of affordable multi-modal RGB-Depth cameras. Depth data generated by these sensors is a very valuable input information, although the representation of 3D descriptors is still a critical step to obtain robust object representations. This paper presents an overview of different multi-modal descriptors, and provides a comparative study of two feature descriptors called Multi-modal Hand Shape (MHS) and Fourier-based Hand Shape (FHS), which compute local and global 2D-3D hand shape statistics to robustly describe hand poses. A new dataset of 38K hand poses has been created for real-time hand pose and gesture recognition, corresponding to five hand shape categories recorded from eight users. Experimental results show good performance of the fused MHS and FHS descriptors, improving recognition accuracy while assuring real-time computation in HCI scenarios.

[1]  Toby Sharp,et al.  Real-time human pose recognition in parts from single depth images , 2011, CVPR.

[2]  Salah Bourennane,et al.  Comparison of fourier descriptors and Hu moments for hand posture recognition , 2007, 2007 15th European Signal Processing Conference.

[3]  Sergio Escalera,et al.  Circular Blurred Shape Model for Multiclass Symbol Recognition , 2011, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).

[4]  Nico Blodow,et al.  Learning informative point classes for the acquisition of object model maps , 2008, 2008 10th International Conference on Control, Automation, Robotics and Vision.

[5]  Jitendra Malik,et al.  Recognizing Objects in Range Data Using Regional Point Descriptors , 2004, ECCV.

[6]  Sergio Escalera,et al.  Multi-modal gesture recognition challenge 2013: dataset and results , 2013, ICMI '13.

[7]  David G. Stork,et al.  Pattern Classification , 1973 .

[8]  Federico Tombari,et al.  Unique Signatures of Histograms for Local Surface Description , 2010, ECCV.

[9]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[10]  Nico Blodow,et al.  Fast Point Feature Histograms (FPFH) for 3D registration , 2009, 2009 IEEE International Conference on Robotics and Automation.

[11]  Mauro R. Ruggeri,et al.  Spectral-Driven Isometry-Invariant Matching of 3D Shapes , 2010, International Journal of Computer Vision.

[12]  Chih-Jen Lin,et al.  LIBSVM: A library for support vector machines , 2011, TIST.

[13]  Edward Curry,et al.  Flexible Self-Management Using the Model-View-Controller Pattern , 2008, IEEE Software.

[14]  William J. Schroeder,et al.  The Visualization Toolkit , 2005, The Visualization Handbook.

[15]  Saturnino Maldonado-Bascón,et al.  SURFing the point clouds: Selective 3D spatial pyramids for category-level object recognition , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[16]  Luís A. Alexandre 3D Descriptors for Object and Category Recognition: a Comparative Evaluation , 2012 .

[17]  Gilles Burel,et al.  Determination of the Orientation of 3D Objects Using Spherical Harmonics , 1995, CVGIP Graph. Model. Image Process..

[18]  徐梦溪,et al.  Network video monitoring system based on OpenCV (open source computer vision library) , 2011 .

[19]  Luc Van Gool,et al.  Hough Transform and 3D SURF for Robust Three Dimensional Classification , 2010, ECCV.

[20]  David Minnen,et al.  Towards robust cross-user hand tracking and shape recognition , 2011, 2011 IEEE International Conference on Computer Vision Workshops (ICCV Workshops).

[21]  Craig Gotsman,et al.  Characterizing Shape Using Conformal Factors , 2008, 3DOR@Eurographics.

[22]  Gary R. Bradski,et al.  Fast 3D recognition and pose using the Viewpoint Feature Histogram , 2010, 2010 IEEE/RSJ International Conference on Intelligent Robots and Systems.