ChaLearn Looking at People: IsoGD and ConGD Large-scale RGB-D Gesture Recognition

The ChaLearn large-scale gesture recognition challenge has run twice in two workshops in conjunction with the International Conference on Pattern Recognition (ICPR) 2016 and International Conference on Computer Vision (ICCV) 2017, attracting more than 200 teams around the world. This challenge has two tracks, focusing on isolated and continuous gesture \nobreak recognition, respectively. It describes the creation of both benchmark datasets and analyzes the advances in large-scale gesture recognition based on these two datasets. In this article, we discuss the challenges of collecting large-scale ground-truth annotations of gesture recognition and provide a detailed analysis of the current methods for large-scale isolated and continuous gesture recognition. In addition to the recognition rate and mean Jaccard index (MJI) as evaluation metrics used in previous challenges, we introduce the corrected segmentation rate (CSR) metric to evaluate the performance of temporal segmentation for continuous gesture recognition. Furthermore, we propose a bidirectional long short-term memory (Bi-LSTM) method, determining video division points based on skeleton points. Experiments show that the proposed Bi-LSTM outperforms state-of-the-art methods with an absolute improvement of 8.1% (from 0.8917 to 0.9639) of CSR.

[1]  Wei Li,et al.  One-shot learning gesture recognition from RGB-D data using bag of features , 2013, J. Mach. Learn. Res..

[2]  Fei-Fei Li,et al.  Large-Scale Video Classification with Convolutional Neural Networks , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[3]  Pichao Wang,et al.  Large-Scale Multimodal Gesture Recognition Using Heterogeneous Networks , 2017, 2017 IEEE International Conference on Computer Vision Workshops (ICCVW).

[4]  Sergio Escalera,et al.  ChaLearn looking at people 2015 new competitions: Age estimation and cultural event recognition , 2015, 2015 International Joint Conference on Neural Networks (IJCNN).

[5]  Sergio Escalera,et al.  Challenges in multimodal gesture recognition , 2016, J. Mach. Learn. Res..

[6]  Zhi Zhang,et al.  Gesture Recognition Using Enhanced Depth Motion Map and Static Pose Map , 2017, 2017 12th IEEE International Conference on Automatic Face & Gesture Recognition (FG 2017).

[7]  Sergio Escalera,et al.  Action Recognition from RGB-D Data: Comparison and Fusion of Spatio-Temporal Handcrafted Features and Deep Strategies , 2017, 2017 IEEE International Conference on Computer Vision Workshops (ICCVW).

[8]  Sergio Escalera,et al.  ChaLearn multi-modal gesture recognition 2013: grand challenge and workshop summary , 2013, ICMI '13.

[9]  Xin Xu,et al.  Large-scale gesture recognition with a fusion of RGB-D data based on optical flow and the C3D model , 2019, Pattern Recognit. Lett..

[10]  Xin Xu,et al.  Multimodal Gesture Recognition Based on the ResC3D Network , 2017, 2017 IEEE International Conference on Computer Vision Workshops (ICCVW).

[11]  Xin Xu,et al.  Large-scale gesture recognition with a fusion of RGB-D data based on the C3D model , 2016, 2016 23rd International Conference on Pattern Recognition (ICPR).

[12]  Lorenzo Torresani,et al.  Learning Spatiotemporal Features with 3D Convolutional Networks , 2014, 2015 IEEE International Conference on Computer Vision (ICCV).

[13]  Luca Maria Gambardella,et al.  Max-pooling convolutional neural networks for vision-based hand gesture recognition , 2011, 2011 IEEE International Conference on Signal and Image Processing Applications (ICSIPA).

[14]  Yoshua Bengio,et al.  Learning long-term dependencies with gradient descent is difficult , 1994, IEEE Trans. Neural Networks.

[15]  Oscar Koller,et al.  Using Convolutional 3D Neural Networks for User-independent continuous gesture recognition , 2016, 2016 23rd International Conference on Pattern Recognition (ICPR).

[16]  Ling Shao,et al.  Learning Discriminative Representations from RGB-D Video Data , 2013, IJCAI.

[17]  Hanqing Lu,et al.  EgoGesture: A New Dataset and Benchmark for Egocentric Hand Gesture Recognition , 2018, IEEE Transactions on Multimedia.

[18]  Juan Song,et al.  Multimodal Gesture Recognition Using 3-D Convolution and Convolutional LSTM , 2017, IEEE Access.

[19]  Richard Bowden,et al.  Particle Filter Based Probabilistic Forced Alignment for Continuous Gesture Recognition , 2017, 2017 IEEE International Conference on Computer Vision Workshops (ICCVW).

[20]  Bruce A. Draper,et al.  Gesture Recognition: Focus on the Hands , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[21]  Jürgen Schmidhuber,et al.  Framewise phoneme classification with bidirectional LSTM and other neural network architectures , 2005, Neural Networks.

[22]  Pichao Wang,et al.  Depth Pooling Based Large-Scale 3-D Action Recognition With Convolutional Neural Networks , 2018, IEEE Transactions on Multimedia.

[23]  Xilin Chen,et al.  Two streams Recurrent Neural Networks for Large-Scale Continuous Gesture Recognition , 2016, 2016 23rd International Conference on Pattern Recognition (ICPR).

[24]  Gholamreza Anbarjafari,et al.  SASE: RGB-Depth Database for Human Head Pose Estimation , 2016, ECCV Workshops.

[25]  Qiguang Miao,et al.  Large-Scale Gesture Recognition With a Fusion of RGB-D Data Based on Saliency Theory and C3D Model , 2018, IEEE Transactions on Circuits and Systems for Video Technology.

[26]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[27]  Jun Wan,et al.  Cooperative Training of Deep Aggregation Networks for RGB-D Action Recognition , 2018, AAAI.

[28]  Luc Van Gool,et al.  The Pascal Visual Object Classes (VOC) Challenge , 2010, International Journal of Computer Vision.

[29]  Mieke Van Herreweghe,et al.  Gesture and Sign Language Recognition with Temporal Residual Networks , 2017, 2017 IEEE International Conference on Computer Vision Workshops (ICCVW).

[30]  Pichao Wang,et al.  Scene Flow to Action Map: A New Representation for RGB-D Based Action Recognition with Convolutional Neural Networks , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[31]  Chengqi Zhang,et al.  Reinforced Self-Attention Network: a Hybrid of Hard and Soft Attention for Sequence Modeling , 2018, IJCAI.

[32]  Michael S. Bernstein,et al.  ImageNet Large Scale Visual Recognition Challenge , 2014, International Journal of Computer Vision.

[33]  Ross B. Girshick,et al.  Fast R-CNN , 2015, 1504.08083.

[34]  Sergio Escalera,et al.  Guest Editors' Introduction to the Special Issue on Multimodal Human Pose Recovery and Behavior Analysis , 2016, IEEE Trans. Pattern Anal. Mach. Intell..

[35]  Isabelle Guyon,et al.  Results and Analysis of the ChaLearn Gesture Challenge 2012 , 2012, WDIA.

[36]  Sergio Escalera,et al.  Dominant and Complementary Emotion Recognition From Still Images of Faces , 2018, IEEE Access.

[37]  Yaser Sheikh,et al.  OpenPose: Realtime Multi-Person 2D Pose Estimation Using Part Affinity Fields , 2018, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[38]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[39]  Chengqi Zhang,et al.  Bi-Directional Block Self-Attention for Fast and Memory-Efficient Sequence Modeling , 2018, ICLR.

[40]  Tinne Tuytelaars,et al.  Rank Pooling for Action Recognition , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[41]  Elena Mugellini,et al.  ChAirGest: a challenge for multimodal mid-air gesture recognition for close HCI , 2013, ICMI '13.

[42]  Sergio Escalera,et al.  Changes in Facial Expression as Biometric: A Database and Benchmarks of Identification , 2018, 2018 13th IEEE International Conference on Automatic Face & Gesture Recognition (FG 2018).

[43]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[44]  Sergio Escalera,et al.  ChaLearn Joint Contest on Multimedia Challenges Beyond Visual Analysis: An overview , 2016, 2016 23rd International Conference on Pattern Recognition (ICPR).

[45]  Yifan Zhang,et al.  Egocentric Gesture Recognition Using Recurrent 3D Convolutional Neural Networks with Spatiotemporal Transformer Modules , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[46]  Sergio Escalera,et al.  ChaLearn Looking at People 2015: Apparent Age and Cultural Event Recognition Datasets and Results , 2015, 2015 IEEE International Conference on Computer Vision Workshop (ICCVW).

[47]  Sergio Escalera,et al.  ChaLearn Looking at People Challenge 2014: Dataset and Results , 2014, ECCV Workshops.

[48]  E. Land,et al.  Lightness and retinex theory. , 1971, Journal of the Optical Society of America.

[49]  Jun Wan,et al.  A Unified Framework for Multi-Modal Isolated Gesture Recognition , 2018, ACM Trans. Multim. Comput. Commun. Appl..

[50]  Juan Song,et al.  Learning Spatiotemporal Features Using 3DCNN and Convolutional LSTM for Gesture Recognition , 2017, 2017 IEEE International Conference on Computer Vision Workshops (ICCVW).

[51]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[52]  Luca Antiga,et al.  Automatic differentiation in PyTorch , 2017 .

[53]  Yaser Sheikh,et al.  Hand Keypoint Detection in Single Images Using Multiview Bootstrapping , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[54]  Varun Ramakrishna,et al.  Convolutional Pose Machines , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[55]  Kaiming He,et al.  Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[56]  Jun Wan,et al.  Explore Efficient Local Features from RGB-D Data for One-Shot Learning Gesture Recognition , 2016, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[57]  Kuldip K. Paliwal,et al.  Bidirectional recurrent neural networks , 1997, IEEE Trans. Signal Process..

[58]  Juan Song,et al.  Large-scale Isolated Gesture Recognition using pyramidal 3D convolutional networks , 2016, 2016 23rd International Conference on Pattern Recognition (ICPR).

[59]  Sergio Escalera,et al.  Results and Analysis of ChaLearn LAP Multi-modal Isolated and Continuous Gesture Recognition, and Real Versus Fake Expressed Emotions Challenges , 2017, 2017 IEEE International Conference on Computer Vision Workshops (ICCVW).

[60]  Sergio Escalera,et al.  Multi-modal gesture recognition challenge 2013: dataset and results , 2013, ICMI '13.

[61]  Sergio Escalera,et al.  ChaLearn Looking at People 2015 challenges: Action spotting and cultural event recognition , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[62]  Sergio Escalera,et al.  Automatic Recognition of Facial Displays of Unfelt Emotions , 2017, IEEE Transactions on Affective Computing.

[63]  Pichao Wang,et al.  Large-Scale Multimodal Gesture Segmentation and Recognition Based on Convolutional Neural Networks , 2017, 2017 IEEE International Conference on Computer Vision Workshops (ICCVW).

[64]  Pichao Wang,et al.  Large-scale Continuous Gesture Recognition Using Convolutional Neural Networks , 2016, 2016 23rd International Conference on Pattern Recognition (ICPR).

[65]  Juan Song,et al.  Continuous Gesture Segmentation and Recognition Using 3DCNN and Convolutional LSTM , 2019, IEEE Transactions on Multimedia.

[66]  Pi-Cheng Hsiu,et al.  Learning Adaptive Hidden Layers for Mobile Gesture Recognition , 2018, AAAI.

[67]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[68]  Sergio Escalera,et al.  Audio-Visual Emotion Recognition in Video Clips , 2019, IEEE Transactions on Affective Computing.

[69]  Sergio Escalera,et al.  ChaLearn Looking at People RGB-D Isolated and Continuous Datasets for Gesture Recognition , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[70]  Xilin Chen,et al.  Continuous Gesture Recognition with Hand-Oriented Spatiotemporal Feature , 2017, 2017 IEEE International Conference on Computer Vision Workshops (ICCVW).

[71]  Cewu Lu,et al.  RMPE: Regional Multi-person Pose Estimation , 2016, 2017 IEEE International Conference on Computer Vision (ICCV).

[72]  Chen Zhu,et al.  Vision Based Hand Gesture Recognition Using 3D Shape Context , 2018, 2018 IEEE International Conference on Robotics and Biomimetics (ROBIO).