论文信息 - Global and Local Spatial-Attention Network for Isolated Gesture Recognition

Global and Local Spatial-Attention Network for Isolated Gesture Recognition

In this paper, we focus on isolated gesture recognition from RGB-D videos. Our main idea is to design an algorithm that can extract global and local information from multi-modality inputs. To this end, we propose a novel attention-based method with 3D convolutional neural network (CNN) to recognize isolated gesture recognition. It includes two parts. The first one is a global and local spatial-attention network (GLSANet), which takes into account the global information that focuses on the context of the frame and the local information that focuses on the hand/arm actions of the person, to extract efficient features from multi-modality inputs simultaneously. The second part is an adaptive model fusion strategy to fuse the predicted probabilities from multi-modality inputs. Experiments demonstrate that the proposed method has achieved state-of-the-art performance on the IsoGD dataset.

[1] Xin Xu,et al. Multimodal Gesture Recognition Based on the ResC3D Network , 2017, 2017 IEEE International Conference on Computer Vision Workshops (ICCVW).

[2] Xin Xu,et al. Large-scale gesture recognition with a fusion of RGB-D data based on the C3D model , 2016, 2016 23rd International Conference on Pattern Recognition (ICPR).

[3] Juan Song,et al. Large-scale Isolated Gesture Recognition using pyramidal 3D convolutional networks , 2016, 2016 23rd International Conference on Pattern Recognition (ICPR).

[4] Sergio Escalera,et al. ChaLearn Looking at People RGB-D Isolated and Continuous Datasets for Gesture Recognition , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[5] Juan Song,et al. Multimodal Gesture Recognition Using 3-D Convolution and Convolutional LSTM , 2017, IEEE Access.

[6] Thomas Serre,et al. HMDB: A large video database for human motion recognition , 2011, 2011 International Conference on Computer Vision.

[7] Tinne Tuytelaars,et al. Rank Pooling for Action Recognition , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[8] Pichao Wang,et al. Large-Scale Multimodal Gesture Recognition Using Heterogeneous Networks , 2017, 2017 IEEE International Conference on Computer Vision Workshops (ICCVW).

[9] Jian Sun,et al. Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[10] Xin Xu,et al. Large-scale gesture recognition with a fusion of RGB-D data based on optical flow and the C3D model , 2019, Pattern Recognit. Lett..

[11] Anupam Agrawal,et al. Vision based hand gesture recognition for human computer interaction: a survey , 2012, Artificial Intelligence Review.

[12] Jun Wan,et al. Large-Scale Isolated Gesture Recognition Using a Refined Fused Model Based on Masked Res-C3D Network and Skeleton LSTM , 2018, 2018 13th IEEE International Conference on Automatic Face & Gesture Recognition (FG 2018).

[13] Gerhard Rigoll,et al. Motion Fused Frames: Data Level Fusion Strategy for Hand Gesture Recognition , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[14] Jun Wan,et al. A Unified Framework for Multi-Modal Isolated Gesture Recognition , 2018, ACM Trans. Multim. Comput. Commun. Appl..

[15] Juan Song,et al. Learning Spatiotemporal Features Using 3DCNN and Convolutional LSTM for Gesture Recognition , 2017, 2017 IEEE International Conference on Computer Vision Workshops (ICCVW).

[16] Qiguang Miao,et al. Large-Scale Gesture Recognition With a Fusion of RGB-D Data Based on Saliency Theory and C3D Model , 2018, IEEE Transactions on Circuits and Systems for Video Technology.

[17] Luca Antiga,et al. Automatic differentiation in PyTorch , 2017 .

[18] Lorenzo Torresani,et al. Learning Spatiotemporal Features with 3D Convolutional Networks , 2014, 2015 IEEE International Conference on Computer Vision (ICCV).

[19] Pichao Wang,et al. Large-scale Isolated Gesture Recognition using Convolutional Neural Networks , 2016, 2016 23rd International Conference on Pattern Recognition (ICPR).

[20] Xilin Chen,et al. Two streams Recurrent Neural Networks for Large-Scale Continuous Gesture Recognition , 2016, 2016 23rd International Conference on Pattern Recognition (ICPR).

[21] Varun Ramakrishna,et al. Convolutional Pose Machines , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).