Global and Local Spatial-Attention Network for Isolated Gesture Recognition

In this paper, we focus on isolated gesture recognition from RGB-D videos. Our main idea is to design an algorithm that can extract global and local information from multi-modality inputs. To this end, we propose a novel attention-based method with 3D convolutional neural network (CNN) to recognize isolated gesture recognition. It includes two parts. The first one is a global and local spatial-attention network (GLSANet), which takes into account the global information that focuses on the context of the frame and the local information that focuses on the hand/arm actions of the person, to extract efficient features from multi-modality inputs simultaneously. The second part is an adaptive model fusion strategy to fuse the predicted probabilities from multi-modality inputs. Experiments demonstrate that the proposed method has achieved state-of-the-art performance on the IsoGD dataset.

[1]  Xin Xu,et al.  Multimodal Gesture Recognition Based on the ResC3D Network , 2017, 2017 IEEE International Conference on Computer Vision Workshops (ICCVW).

[2]  Xin Xu,et al.  Large-scale gesture recognition with a fusion of RGB-D data based on the C3D model , 2016, 2016 23rd International Conference on Pattern Recognition (ICPR).

[3]  Juan Song,et al.  Large-scale Isolated Gesture Recognition using pyramidal 3D convolutional networks , 2016, 2016 23rd International Conference on Pattern Recognition (ICPR).

[4]  Sergio Escalera,et al.  ChaLearn Looking at People RGB-D Isolated and Continuous Datasets for Gesture Recognition , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[5]  Juan Song,et al.  Multimodal Gesture Recognition Using 3-D Convolution and Convolutional LSTM , 2017, IEEE Access.

[6]  Thomas Serre,et al.  HMDB: A large video database for human motion recognition , 2011, 2011 International Conference on Computer Vision.

[7]  Tinne Tuytelaars,et al.  Rank Pooling for Action Recognition , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[8]  Pichao Wang,et al.  Large-Scale Multimodal Gesture Recognition Using Heterogeneous Networks , 2017, 2017 IEEE International Conference on Computer Vision Workshops (ICCVW).

[9]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[10]  Xin Xu,et al.  Large-scale gesture recognition with a fusion of RGB-D data based on optical flow and the C3D model , 2019, Pattern Recognit. Lett..

[11]  Anupam Agrawal,et al.  Vision based hand gesture recognition for human computer interaction: a survey , 2012, Artificial Intelligence Review.

[12]  Jun Wan,et al.  Large-Scale Isolated Gesture Recognition Using a Refined Fused Model Based on Masked Res-C3D Network and Skeleton LSTM , 2018, 2018 13th IEEE International Conference on Automatic Face & Gesture Recognition (FG 2018).

[13]  Gerhard Rigoll,et al.  Motion Fused Frames: Data Level Fusion Strategy for Hand Gesture Recognition , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[14]  Jun Wan,et al.  A Unified Framework for Multi-Modal Isolated Gesture Recognition , 2018, ACM Trans. Multim. Comput. Commun. Appl..

[15]  Juan Song,et al.  Learning Spatiotemporal Features Using 3DCNN and Convolutional LSTM for Gesture Recognition , 2017, 2017 IEEE International Conference on Computer Vision Workshops (ICCVW).

[16]  Qiguang Miao,et al.  Large-Scale Gesture Recognition With a Fusion of RGB-D Data Based on Saliency Theory and C3D Model , 2018, IEEE Transactions on Circuits and Systems for Video Technology.

[17]  Luca Antiga,et al.  Automatic differentiation in PyTorch , 2017 .

[18]  Lorenzo Torresani,et al.  Learning Spatiotemporal Features with 3D Convolutional Networks , 2014, 2015 IEEE International Conference on Computer Vision (ICCV).

[19]  Pichao Wang,et al.  Large-scale Isolated Gesture Recognition using Convolutional Neural Networks , 2016, 2016 23rd International Conference on Pattern Recognition (ICPR).

[20]  Xilin Chen,et al.  Two streams Recurrent Neural Networks for Large-Scale Continuous Gesture Recognition , 2016, 2016 23rd International Conference on Pattern Recognition (ICPR).

[21]  Varun Ramakrishna,et al.  Convolutional Pose Machines , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).