Sign Language Fingerspelling Classification from Depth and Color Images Using a Deep Belief Network

Automatic sign language recognition is an open problem that has received a lot of attention recently, not only because of its usefulness to signers, but also due to the numerous applications a sign classifier can have. In this article, we present a new feature extraction technique for hand pose recognition using depth and intensity images captured from a Microsoft Kinect sensor. We applied our technique to American Sign Language finger spelling classification using a Deep Belief Network, for which our feature extraction technique is tailored. We evaluated our results on a multi-user data set with two scenarios: one with all known users and one with an unseen user. We achieved 99% recall and precision on the first, and 77% recall and 79% precision on the second. Our method is also capable of real-time sign classification and is adaptive to any environment or lightning intensity.

[1]  Luc Van Gool,et al.  Combining RGB and ToF cameras for real-time 3D hand gesture interaction , 2011, WACV.

[2]  Geoffrey E. Hinton A Practical Guide to Training Restricted Boltzmann Machines , 2012, Neural Networks: Tricks of the Trade.

[3]  Hanqing Lu,et al.  A real-time hand gesture recognition method , 2007, 2011 International Conference on Electronics, Communications and Control (ICECC).

[4]  Nicolas Pugeault,et al.  Spelling it out: Real-time ASL fingerspelling recognition , 2011, 2011 IEEE International Conference on Computer Vision Workshops (ICCV Workshops).

[5]  Thomas Hopkins,et al.  What is American Sign Language , 2009 .

[6]  Stephan Liwicki,et al.  Automatic recognition of fingerspelled words in British Sign Language , 2009, 2009 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops.

[7]  Richard Bowden,et al.  A boosted classifier tree for hand shape detection , 2004, Sixth IEEE International Conference on Automatic Face and Gesture Recognition, 2004. Proceedings..

[8]  Oya Aran,et al.  VISION BASED SIGN LANGUAGE RECOGNITION: MODELING AND RECOGNIZING ISOLATED SIGNS WITH MANUAL AND NON-MANUAL COMPONENTS , 2008 .

[9]  S. Foo,et al.  Hand pose estimation for American sign language recognition , 2004, Thirty-Sixth Southeastern Symposium on System Theory, 2004. Proceedings of the.

[10]  Vassilis Athitsos,et al.  Comparing gesture recognition accuracy using color and depth information , 2011, PETRA '11.

[11]  Gregory Shakhnarovich,et al.  American sign language fingerspelling recognition with phonological feature-based tandem models , 2012, 2012 IEEE Spoken Language Technology Workshop (SLT).

[12]  Paulo J. L. Adeodato,et al.  PCA and Gaussian noise in MLP neural network training improve generalization in problems with small and unbalanced data sets , 2011, The 2011 International Joint Conference on Neural Networks.

[13]  Carol Padden,et al.  How the Alphabet Came to Be Used in a Sign Language , 2003 .

[14]  Luc Van Gool,et al.  Real-time sign language letter and word recognition from depth data , 2011, 2011 IEEE International Conference on Computer Vision Workshops (ICCV Workshops).

[15]  Geoffrey E. Hinton,et al.  Reducing the Dimensionality of Data with Neural Networks , 2006, Science.