CNN+RNN Depth and Skeleton based Dynamic Hand Gesture Recognition

Human activity and gesture recognition is an important component of rapidly growing domain of ambient intelligence, in particular in assisting living and smart homes. In this paper, we propose to combine the power of two deep learning techniques, the convolutional neural networks (CNN) and the recurrent neural networks (RNN), for automated hand gesture recognition using both depth and skeleton data. Each of these types of data can be used separately to train neural networks to recognize hand gestures. While RNN were reported previously to perform well in recognition of sequences of movement for each skeleton joint given the skeleton information only, this study aims at utilizing depth data and apply CNN to extract important spatial information from the depth images. Together, the tandem CNN+RNN is capable of recognizing a sequence of gestures more accurately. As well, various types of fusion are studied to combine both the skeleton and depth information in order to extract temporal-spatial information. An overall accuracy of 85.46% is achieved on the dynamic hand gesture-14/28 dataset.

[1]  Maureen Schmitter-Edgecombe,et al.  Automated Cognitive Health Assessment Using Smart Home Monitoring of Complex Tasks , 2013, IEEE Transactions on Systems, Man, and Cybernetics: Systems.

[2]  Guijin Wang,et al.  Motion feature augmented recurrent neural network for skeleton-based dynamic hand gesture recognition , 2017, 2017 IEEE International Conference on Image Processing (ICIP).

[3]  Luca Maria Gambardella,et al.  Max-pooling convolutional neural networks for vision-based hand gesture recognition , 2011, 2011 IEEE International Conference on Signal and Image Processing Applications (ICSIPA).

[4]  Francisco José Madrid-Cuevas,et al.  Depth silhouettes for gesture recognition , 2008, Pattern Recognit. Lett..

[5]  Yasushi Yagi,et al.  Gesture recognition by using colored gloves , 1996, 1996 IEEE International Conference on Systems, Man and Cybernetics. Information Intelligence and Systems (Cat. No.96CH35929).

[6]  Hazem Wannous,et al.  Skeleton-Based Dynamic Hand Gesture Recognition , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[7]  Matthew D. Zeiler ADADELTA: An Adaptive Learning Rate Method , 2012, ArXiv.

[8]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[9]  Bodo Rosenhahn,et al.  Real-Time Sign Language Recognition Using a Consumer Depth Camera , 2013, 2013 IEEE International Conference on Computer Vision Workshops.

[10]  Lars Bretzner,et al.  Hand gesture recognition using multi-scale colour features, hierarchical models and particle filtering , 2002, Proceedings of Fifth IEEE International Conference on Automatic Face Gesture Recognition.

[11]  Juan José Pantrigo,et al.  Convolutional Neural Networks and Long Short-Term Memory for skeleton-based human activity and hand gesture recognition , 2018, Pattern Recognit..

[12]  Diane J. Cook,et al.  Learning Setting-Generalized Activity Models for Smart Spaces , 2012, IEEE Intelligent Systems.

[13]  Rafik A. Goubran,et al.  Integration of Smart Home Technologies in a Health Monitoring System for the Elderly , 2007, 21st International Conference on Advanced Information Networking and Applications Workshops (AINAW'07).

[14]  Pavlo Molchanov,et al.  Hand gesture recognition with 3D convolutional neural networks , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[15]  Hema Swetha Koppula,et al.  Learning human activities and object affordances from RGB-D videos , 2012, Int. J. Robotics Res..

[16]  Anind K. Dey,et al.  Embedded assessment of aging adults: A concept validation with stakeholders , 2010, 2010 4th International Conference on Pervasive Computing Technologies for Healthcare.

[17]  Mohan M. Trivedi,et al.  Hand Gesture Recognition in Real Time for Automotive Interfaces: A Multimodal Vision-Based Approach and Evaluations , 2014, IEEE Transactions on Intelligent Transportation Systems.

[18]  H. Jimison,et al.  Mobility Assessment Using Event-Related Responses , 2006, 1st Transdisciplinary Conference on Distributed Diagnosis and Home Healthcare, 2006. D2H2..

[19]  Nicolas Pugeault,et al.  Spelling it out: Real-time ASL fingerspelling recognition , 2011, 2011 IEEE International Conference on Computer Vision Workshops (ICCV Workshops).

[20]  Diane J. Cook,et al.  Human Activity Recognition and Pattern Discovery , 2010, IEEE Pervasive Computing.