Deep Convolutional Network with Long Short-Term Memory Layers for Dynamic Gesture Recognition

The framework based on a convolutional neural network (CNN) with adding long short-term memory layers (LSTM) for recognizing hand gestures from a video stream in real-time is presented. A review and analysis of existing models relating to gesture recognition in deep learning are considered. The task was to perform hand gesture classification using deep convolutional neural networks and obtain a simple, precise and resource-efficient system for visual recognition of letters and digits in sign language. The model is stable to rather wide angles of hand rotation and independent of lighting due to the using of contour patterns. In experiments with CNN, 98.46% accuracy on the test dataset has been obtained.

[1]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[2]  Hideki Nakayama,et al.  Multimodal Gesture Recognition Using Multi-stream Recurrent Neural Network , 2015, PSIVT.

[3]  Hyunchul Shin,et al.  Hand gesture recognition using deep learning , 2017, 2017 International SoC Design Conference (ISOCC).

[4]  Daniel Roggen,et al.  Deep Convolutional and LSTM Recurrent Neural Networks for Multimodal Wearable Activity Recognition , 2016, Sensors.

[5]  Vijay John,et al.  Real-time hand posture and gesture-based touchless automotive user interface using deep learning , 2017, 2017 IEEE Intelligent Vehicles Symposium (IV).

[6]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[7]  Antonio-Javier Gallego,et al.  Hand Gesture Detection with Convolutional Neural Networks , 2018, DCAI.

[8]  Christian Wolf,et al.  Multi-scale Deep Learning for Gesture Detection and Localization , 2014, ECCV Workshops.

[9]  Junji Yamato,et al.  Recognizing human action in time-sequential images using hidden Markov model , 1992, Proceedings 1992 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[10]  Neha V. Tavari,et al.  Indian Sign Language Recognition based on Histograms of Oriented Gradient , 2014 .

[11]  Laura Cristina Lanzarini,et al.  LSA64: An Argentinian Sign Language Dataset , 2023, ArXiv.

[12]  Dianna Radpour,et al.  Using Deep Convolutional Networks for Gesture Recognition in American Sign Language , 2017, ArXiv.

[13]  Andrew Zisserman,et al.  Two-Stream Convolutional Networks for Action Recognition in Videos , 2014, NIPS.

[14]  Benjamin Schrauwen,et al.  Sign Language Recognition Using Convolutional Neural Networks , 2014, ECCV Workshops.

[15]  Ming Yang,et al.  3D Convolutional Neural Networks for Human Action Recognition , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[16]  Trevor Darrell,et al.  Long-term recurrent convolutional networks for visual recognition and description , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[17]  Ken Perlin,et al.  Real-Time Continuous Pose Recovery of Human Hands Using Convolutional Networks , 2014, ACM Trans. Graph..

[18]  J. Kautz,et al.  Hand Gesture Recognition with 3 D Convolutional Neural Networks , 2015 .

[19]  Inna Skarga-Bandurova,et al.  Special Considerations for the Implementation of Data Processing Technique for Gesture Recognition , 2019 .

[20]  Wang Xi,et al.  Deep Learning for Hand Gesture Recognition on Skeletal Data , 2018, 2018 13th IEEE International Conference on Automatic Face & Gesture Recognition (FG 2018).

[21]  Chia-Feng Juang,et al.  A recurrent fuzzy network for fuzzy temporal sequence processing and gesture recognition , 2005, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).