Single-frame hand gesture recognition using color and depth kernel descriptors

This paper presents a flexible method for single-frame hand gesture recognition by fusing information from color and depth images. Existing methods usually focus on designing intuitive features for color and depth images. On the contrary, our method first extracts common patch-level features, and fuses them by means of kernel descriptors. Linear SVM is then adopted to predict the class label efficiently. In our experiments on two American Sign Language (ASL) datasets, we demonstrate that our approach recognizes each sign accurately with only a small number of training samples, and is robust to the change of distance between the hand and the camera.

[1]  Nicolas Pugeault,et al.  Spelling it out: Real-time ASL fingerspelling recognition , 2011, 2011 IEEE International Conference on Computer Vision Workshops (ICCV Workshops).

[2]  Luc Van Gool,et al.  Real-time sign language letter and word recognition from depth data , 2011, 2011 IEEE International Conference on Computer Vision Workshops (ICCV Workshops).

[3]  Jürgen Schmidhuber,et al.  Multi-column deep neural network for traffic sign classification , 2012, Neural Networks.

[4]  Stan Sclaroff,et al.  Estimating 3D hand pose from a cluttered image , 2003, 2003 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2003. Proceedings..

[5]  Jürgen Schmidhuber,et al.  Multi-column deep neural networks for image classification , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[6]  David J. Fleet,et al.  Model-based hand tracking with texture, shading and self-occlusions , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[7]  Cristian Sminchisescu,et al.  Efficient Match Kernel between Sets of Features for Visual Recognition , 2009, NIPS.

[8]  Shuicheng Yan,et al.  An HOG-LBP human detector with partial occlusion handling , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[9]  Stan Sclaroff,et al.  An appearance-based framework for 3D hand shape classification and camera viewpoint estimation , 2002, Proceedings of Fifth IEEE International Conference on Automatic Face Gesture Recognition.

[10]  Björn Stenger,et al.  Model-based hand tracking using a hierarchical Bayesian filter , 2006, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[11]  David G. Lowe,et al.  Object recognition from local scale-invariant features , 1999, Proceedings of the Seventh IEEE International Conference on Computer Vision.

[12]  Toby Sharp,et al.  Real-time human pose recognition in parts from single depth images , 2011, CVPR.

[13]  Dieter Fox,et al.  Kernel Descriptors for Visual Recognition , 2010, NIPS.

[14]  Luc Van Gool,et al.  Real time head pose estimation with random regression forests , 2011, CVPR 2011.

[15]  Antonis A. Argyros,et al.  Efficient model-based 3D tracking of hand articulations using Kinect , 2011, BMVC.