Dynamic hand gesture recognition based on short-term sampling neural networks

Hand gestures are a natural way for human-robot interaction. Vision based dynamic hand gesture recognition has become a hot research topic due to its various applications. This paper presents a novel deep learning network for hand gesture recognition. The network integrates several well-proved modules together to learn both short-term and long-term features from video inputs and meanwhile avoid intensive computation. To learn short-term features, each video input is segmented into a fixed number of frame groups. A frame is randomly selected from each group and represented as an RGB image as well as an optical flow snapshot. These two entities are fused and fed into a convolutional neural network ( ConvNet ) for feature extraction. The ConvNets for all groups share parameters. To learn long-term features, outputs from all ConvNets are fed into a long short-term memory ( LSTM ) network, by which a final classification result is predicted. The new model has been tested with two popular hand gesture datasets, namely the Jester dataset and Nvidia dataset. Comparing with other models, our model produced very competitive results. The robustness of the new model has also been proved with an augmented dataset with enhanced diversity of hand gestures.

[1]  Jiajun Wang,et al.  Parameter optimization of interval Type-2 fuzzy neural networks based on PSO and BBBC methods , 2019, IEEE/CAA Journal of Automatica Sinica.

[2]  Xin Zhang,et al.  LPSNet: A Novel Log Path Signature Feature Based Hand Gesture Recognition Framework , 2017, 2017 IEEE International Conference on Computer Vision Workshops (ICCVW).

[3]  Hoi-Jun Yoo,et al.  Low-Power Convolutional Neural Network Processor for a Face-Recognition System , 2017, IEEE Micro.

[4]  Jürgen Schmidhuber,et al.  Multi-column deep neural networks for image classification , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[5]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[6]  Lorenzo Torresani,et al.  Learning Spatiotemporal Features with 3D Convolutional Networks , 2014, 2015 IEEE International Conference on Computer Vision (ICCV).

[7]  Alex Pentland,et al.  Real-Time American Sign Language Recognition Using Desk and Wearable Computer Based Video , 1998, IEEE Trans. Pattern Anal. Mach. Intell..

[8]  Pichao Wang,et al.  Multiview-Based 3-D Action Recognition Using Deep Networks , 2019, IEEE Transactions on Human-Machine Systems.

[9]  Chen Zhu,et al.  Vision Based Hand Gesture Recognition Using 3D Shape Context , 2018, 2018 IEEE International Conference on Robotics and Biomimetics (ROBIO).

[10]  Ravindra Sor,et al.  A Review on Hand Gesture Recognition System , 2015, 2015 International Conference on Computing Communication Control and Automation.

[11]  Qiang Yang,et al.  A Survey on Transfer Learning , 2010, IEEE Transactions on Knowledge and Data Engineering.

[12]  Pietro Zanuttigh,et al.  Hand gesture recognition with leap motion and kinect devices , 2014, 2014 IEEE International Conference on Image Processing (ICIP).

[13]  Limin Wang,et al.  Temporal Segment Networks for Action Recognition in Videos , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[14]  Sergio Escalera,et al.  RGB-D-based Human Motion Recognition with Deep Learning: A Survey , 2017, Comput. Vis. Image Underst..

[15]  Richard Bowden,et al.  Sign Language Recognition , 2011, Visual Analysis of Humans.

[16]  Yoshua Bengio,et al.  Gradient-based learning applied to document recognition , 1998, Proc. IEEE.

[17]  Jiujun Cheng,et al.  Dendritic Neuron Model With Effective Learning Algorithms for Classification, Approximation, and Prediction , 2019, IEEE Transactions on Neural Networks and Learning Systems.

[18]  Paolo Dario,et al.  A Survey of Glove-Based Systems and Their Applications , 2008, IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews).

[19]  Xiaohui Yuan,et al.  Automatic feature point detection and tracking of human actions in time-of-flight videos , 2017, IEEE/CAA Journal of Automatica Sinica.

[20]  David Jones,et al.  Discerning structure from freeform handwritten notes , 2003, Seventh International Conference on Document Analysis and Recognition, 2003. Proceedings..

[21]  Chun-Jen Tsai,et al.  Synthetic Training of Deep CNN for 3D Hand Gesture Identification , 2017, 2017 International Conference on Control, Artificial Intelligence, Robotics & Optimization (ICCAIRO).

[22]  Sergey Ioffe,et al.  Rethinking the Inception Architecture for Computer Vision , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[23]  Ahmet Gunduz,et al.  Resource Efficient 3D Convolutional Neural Networks , 2019, 2019 IEEE/CVF International Conference on Computer Vision Workshop (ICCVW).

[24]  Karl F. MacDorman,et al.  Review of constraints on vision-based gesture recognition for human-computer interaction , 2018, IET Comput. Vis..

[25]  Andrew W. Senior,et al.  Long short-term memory recurrent neural network architectures for large scale acoustic modeling , 2014, INTERSPEECH.

[26]  Gerhard Rigoll,et al.  Motion Fused Frames: Data Level Fusion Strategy for Hand Gesture Recognition , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[27]  Peter Secretan Learning , 1965, Mental Health.

[28]  Pavlo Molchanov,et al.  Online Detection and Classification of Dynamic Hand Gestures with Recurrent 3D Convolutional Neural Networks , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[29]  Matthew J. Hausknecht,et al.  Beyond short snippets: Deep networks for video classification , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[30]  G. Rigoll,et al.  Resource Efficient 3D Convolutional Neural Networks , 2019, 2019 IEEE/CVF International Conference on Computer Vision Workshop (ICCVW).

[31]  Luc Van Gool,et al.  Temporal Segment Networks: Towards Good Practices for Deep Action Recognition , 2016, ECCV.

[32]  Bin Hu,et al.  Deep Learning Based Hand Gesture Recognition and UAV Flight Controls , 2018, International Journal of Automation and Computing.

[33]  Joanna Materzynska,et al.  The Jester Dataset: A Large-Scale Video Dataset of Human Gestures , 2019, 2019 IEEE/CVF International Conference on Computer Vision Workshop (ICCVW).

[34]  Andrew Zisserman,et al.  Convolutional Two-Stream Network Fusion for Video Action Recognition , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[35]  Sergio Escalera,et al.  Deep Learning for Action and Gesture Recognition in Image Sequences: A Survey , 2017, Gesture Recognition.

[36]  Cordelia Schmid,et al.  A Robust and Efficient Video Representation for Action Recognition , 2015, International Journal of Computer Vision.

[37]  Andrew Zisserman,et al.  Two-Stream Convolutional Networks for Action Recognition in Videos , 2014, NIPS.

[38]  Jiacun Wang,et al.  Dynamic Hand Gesture Recognition Based on 3D Convolutional Neural Network Models , 2019, 2019 IEEE 16th International Conference on Networking, Sensing and Control (ICNSC).

[39]  Ahmet Gunduz,et al.  Real-time Hand Gesture Detection and Classification Using Convolutional Neural Networks , 2019, 2019 14th IEEE International Conference on Automatic Face & Gesture Recognition (FG 2019).

[40]  Yi Zhu,et al.  Hidden Two-Stream Convolutional Networks for Action Recognition , 2017, ACCV.

[41]  Fei-Fei Li,et al.  Large-Scale Video Classification with Convolutional Neural Networks , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[42]  Abhinav Gupta,et al.  ActionVLAD: Learning Spatio-Temporal Aggregation for Action Classification , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[43]  Marc'Aurelio Ranzato,et al.  Efficient Learning of Sparse Representations with an Energy-Based Model , 2006, NIPS.

[44]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[45]  Nitish Srivastava,et al.  Dropout: a simple way to prevent neural networks from overfitting , 2014, J. Mach. Learn. Res..