Real-Time Sign Language Recognition Using a Consumer Depth Camera

Gesture recognition remains a very challenging task in the field of computer vision and human computer interaction (HCI). A decade ago the task seemed to be almost unsolvable with the data provided by a single RGB camera. Due to recent advances in sensing technologies, such as time-of-flight and structured light cameras, there are new data sources available, which make hand gesture recognition more feasible. In this work, we propose a highly precise method to recognize static gestures from a depth data, provided from one of the above mentioned devices. The depth images are used to derive rotation-, translation- and scale-invariant features. A multi-layered random forest (MLRF) is then trained to classify the feature vectors, which yields to the recognition of the hand signs. The training time and memory required by MLRF are much smaller, compared to a simple random forest with equivalent precision. This allows to repeat the training procedure of MLRF without significant effort. To show the advantages of our technique, we evaluate our algorithm on synthetic data, on publicly available dataset, containing 24 signs from American Sign Language(ASL) and on a new dataset, collected using recently appeared Intel Creative Gesture Camera.

[1]  Yali Amit,et al.  Shape Quantization and Recognition with Randomized Trees , 1997, Neural Computation.

[2]  Trevor Hastie,et al.  The Elements of Statistical Learning , 2001 .

[3]  Andrew W. Fitzgibbon,et al.  Real-time human pose recognition in parts from single depth images , 2011, CVPR 2011.

[4]  Emil M. Petriu,et al.  Hand gesture detection and recognition using principal component analysis , 2011, 2011 IEEE International Conference on Computational Intelligence for Measurement Systems and Applications (CIMSA) Proceedings.

[5]  Luc Van Gool,et al.  Haarlet-based hand gesture recognition for 3D interaction , 2009, 2009 Workshop on Applications of Computer Vision (WACV).

[6]  Lale Akarun,et al.  Real time hand pose estimation using depth sensors , 2011, 2011 IEEE International Conference on Computer Vision Workshops (ICCV Workshops).

[7]  Antonis A. Argyros,et al.  Efficient model-based 3D tracking of hand articulations using Kinect , 2011, BMVC.

[8]  Luc Van Gool,et al.  Combining RGB and ToF cameras for real-time 3D hand gesture interaction , 2011, WACV.

[9]  Junsong Yuan,et al.  Robust hand gesture recognition based on finger-earth mover's distance with a commodity depth camera , 2011, ACM Multimedia.

[10]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[11]  Bodo Rosenhahn,et al.  Hand Pose Estimation from a Single RGB-D Image , 2013, ISVC.

[12]  Luís A. Alexandre 3D Descriptors for Object and Category Recognition: a Comparative Evaluation , 2012 .

[13]  S. Foo,et al.  Hand pose estimation for American sign language recognition , 2004, Thirty-Sixth Southeastern Symposium on System Theory, 2004. Proceedings of the.

[14]  Gregory Shakhnarovich,et al.  American sign language fingerspelling recognition with phonological feature-based tandem models , 2012, 2012 IEEE Spoken Language Technology Workshop (SLT).

[15]  Markus Vincze,et al.  Ensemble of shape functions for 3D object classification , 2011, 2011 IEEE International Conference on Robotics and Biomimetics.

[16]  Stephan Liwicki,et al.  Automatic recognition of fingerspelled words in British Sign Language , 2009, 2009 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops.

[17]  Luc Van Gool,et al.  Motion Capture of Hands in Action Using Discriminative Salient Points , 2012, ECCV.

[18]  Wannes Meert,et al.  Rule-based Hand Posture Recognition using Qualitative Finger Configurations Acquired with the Kinect , 2013, ICPRAM.

[19]  Nicolas Pugeault,et al.  Spelling it out: Real-time ASL fingerspelling recognition , 2011, 2011 IEEE International Conference on Computer Vision Workshops (ICCV Workshops).

[20]  Lale Akarun,et al.  Hand Pose Estimation and Hand Shape Classification Using Multi-layered Randomized Decision Forests , 2012, ECCV.