Improving Real-Time Hand Gesture Recognition with Semantic Segmentation

Hand gesture recognition (HGR) takes a central role in human–computer interaction, covering a wide range of applications in the automotive sector, consumer electronics, home automation, and others. In recent years, accurate and efficient deep learning models have been proposed for real-time applications. However, the most accurate approaches tend to employ multiple modalities derived from RGB input frames, such as optical flow. This practice limits real-time performance due to intense extra computational cost. In this paper, we avoid the optical flow computation by proposing a real-time hand gesture recognition method based on RGB frames combined with hand segmentation masks. We employ a light-weight semantic segmentation method (FASSD-Net) to boost the accuracy of two efficient HGR methods: Temporal Segment Networks (TSN) and Temporal Shift Modules (TSM). We demonstrate the efficiency of the proposal on our IPN Hand dataset, which includes thirteen different gestures focused on interaction with touchless screens. The experimental results show that our approach significantly overcomes the accuracy of the original TSN and TSM algorithms by keeping real-time performance.

[1]  Sergio Escalera,et al.  Deep Learning for Action and Gesture Recognition in Image Sequences: A Survey , 2017, Gesture Recognition.

[2]  Andrew Zisserman,et al.  Two-Stream Convolutional Networks for Action Recognition in Videos , 2014, NIPS.

[3]  Thomas Brox,et al.  Learning to Estimate 3D Hand Pose from Single RGB Images , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[4]  Fei-Fei Li,et al.  Large-Scale Video Classification with Convolutional Neural Networks , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[5]  Chih-Yang Lin,et al.  Dynamic Hand Gesture Recognition Using 3DCNN and LSTM with FSM Context-Aware Model , 2019, Sensors.

[6]  Kilian Q. Weinberger,et al.  Densely Connected Convolutional Networks , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[7]  Judy M. Vance,et al.  Industry use of virtual reality in product design and manufacturing: a survey , 2017, Virtual Reality.

[8]  Marco E. Benalcázar,et al.  Real-Time Hand Gesture Recognition Using Surface Electromyography and Machine Learning: A Systematic Literature Review , 2020, Sensors.

[9]  Bruce A. Draper,et al.  Gesture Recognition: Focus on the Hands , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[10]  Rita Cucchiara,et al.  Multimodal Hand Gesture Classification for the Human-Car Interaction , 2020, Informatics.

[11]  Keiji Yanai,et al.  IPN Hand: A Video Dataset and Benchmark for Real-Time Continuous Hand Gesture Recognition , 2021, 2020 25th International Conference on Pattern Recognition (ICPR).

[12]  Martin Saerbeck,et al.  Recent methods and databases in vision-based hand gesture recognition: A review , 2015, Comput. Vis. Image Underst..

[13]  Xin Xu,et al.  Multimodal Gesture Recognition Based on the ResC3D Network , 2017, 2017 IEEE International Conference on Computer Vision Workshops (ICCVW).

[14]  Luc Van Gool,et al.  Temporal Segment Networks: Towards Good Practices for Deep Action Recognition , 2016, ECCV.

[15]  Youn-Long Lin,et al.  HarDNet: A Low Memory Traffic Network , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[16]  Ahmet Gunduz,et al.  Online Dynamic Hand Gesture Recognition Including Efficiency Analysis , 2020, IEEE Transactions on Biometrics, Behavior, and Identity Science.

[17]  Song Han,et al.  Temporal Shift Module for Efficient Video Understanding , 2018, ArXiv.

[18]  Anupam Agrawal,et al.  Vision based hand gesture recognition for human computer interaction: a survey , 2012, Artificial Intelligence Review.

[19]  Uwe Handmann,et al.  Hand Gesture Recognition in Automotive Human–Machine Interaction Using Depth Cameras , 2018, Sensors.

[20]  Michael J. Black,et al.  Optical Flow Estimation Using a Spatial Pyramid Network , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[21]  Elisardo González-Agulla,et al.  Hand gestures to control infotainment equipment in cars , 2014, 2014 IEEE Intelligent Vehicles Symposium Proceedings.

[22]  Lin Chen,et al.  Hand Gesture Recognition Using Compact CNN via Surface Electromyography Signals , 2020, Sensors.

[23]  Sergio Escalera,et al.  ChaLearn Looking at People RGB-D Isolated and Continuous Datasets for Gesture Recognition , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[24]  Lorenzo Torresani,et al.  Learning Spatiotemporal Features with 3D Convolutional Networks , 2014, 2015 IEEE International Conference on Computer Vision (ICCV).

[25]  Francisco J. Gallegos Funes,et al.  Recognition of a Single Dynamic Gesture with the Segmentation Technique HS-ab and Principle Components Analysis (PCA) , 2019, Entropy.

[26]  Margrit Betke,et al.  Comparing random forest approaches to segmenting and classifying gestures , 2017, Image Vis. Comput..

[27]  Sergio Escalera,et al.  A Survey on Deep Learning Based Approaches for Action and Gesture Recognition in Image Sequences , 2017, 2017 12th IEEE International Conference on Automatic Face & Gesture Recognition (FG 2017).

[28]  Keith J. Burnham,et al.  A Research Study of Hand Gesture Recognition Technologies and Applications for Human Vehicle Interaction , 2007 .

[29]  Hanqing Lu,et al.  EgoGesture: A New Dataset and Benchmark for Egocentric Hand Gesture Recognition , 2018, IEEE Transactions on Multimedia.

[30]  Keiji Yanai,et al.  Fast and Accurate Real-Time Semantic Segmentation with Dilated Asymmetric Convolutions , 2021, 2020 25th International Conference on Pattern Recognition (ICPR).

[31]  Pavlo Molchanov,et al.  Online Detection and Classification of Dynamic Hand Gestures with Recurrent 3D Convolutional Neural Networks , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[32]  Pavlo Molchanov,et al.  Multi-sensor system for driver's hand-gesture recognition , 2015, 2015 11th IEEE International Conference and Workshops on Automatic Face and Gesture Recognition (FG).

[33]  Takeo Kanade,et al.  Computer Vision and Image Understanding Computer Vision for Assistive Technologies , 2022 .

[34]  Rita Cucchiara,et al.  Hands on the wheel: A Dataset for Driver Hand Detection and Tracking , 2018, 2018 13th IEEE International Conference on Automatic Face & Gesture Recognition (FG 2018).

[35]  Gerhard Rigoll,et al.  Motion Fused Frames: Data Level Fusion Strategy for Hand Gesture Recognition , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[36]  Richard Szeliski,et al.  A Database and Evaluation Methodology for Optical Flow , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[37]  Sergio Escalera,et al.  Results and Analysis of ChaLearn LAP Multi-modal Isolated and Continuous Gesture Recognition, and Real Versus Fake Expressed Emotions Challenges , 2017, 2017 IEEE International Conference on Computer Vision Workshops (ICCVW).

[38]  Jie Liu,et al.  3D separable convolutional neural network for dynamic hand gesture recognition , 2018, Neurocomputing.

[39]  Chuang Gan,et al.  TSM: Temporal Shift Module for Efficient Video Understanding , 2018, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[40]  Michael S. Bernstein,et al.  ImageNet Large Scale Visual Recognition Challenge , 2014, International Journal of Computer Vision.

[41]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[42]  Mohan M. Trivedi,et al.  Hand Gesture Recognition in Real Time for Automotive Interfaces: A Multimodal Vision-Based Approach and Evaluations , 2014, IEEE Transactions on Intelligent Transportation Systems.

[43]  Abhinav Gupta,et al.  Non-local Neural Networks , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.