Gesture Intention Understanding Based on Depth and RGB Data*

Aiming at the problem that the process of gesture recognition based on color image is greatly affected by environmental factors such as lighting, a gesture intent understanding method based on the fusion of Red-Green-Blue (RGB) data and depth data is proposed. Firstly, the gesture feature extraction based on the Speeded Up Robust Feature (SURF) method after foreground segmentation are used to get gesture information. Then, we apply Backpropagation (BP) neural network to classify and recognize gestures. The final recognition results are obtained through data fusion from recognition results based on both RGB images and depth images. We evaluated the effectiveness of the proposed method through ChaLearn Gesture Database.

[1]  Marina L. Gavrilova,et al.  Occlusion Detection and Localization from Kinect Depth Images , 2016, 2016 International Conference on Cyberworlds (CW).

[2]  Isabelle Guyon,et al.  The ChaLearn gesture dataset (CGD 2011) , 2014, Machine Vision and Applications.

[3]  Alexandros Iosifidis,et al.  View-Invariant Action Recognition Based on Artificial Neural Networks , 2012, IEEE Transactions on Neural Networks and Learning Systems.

[4]  Maja Pantic,et al.  Discriminative Shared Gaussian Processes for Multiview and View-Invariant Facial Expression Recognition , 2015, IEEE Transactions on Image Processing.

[5]  Sergio Escalera,et al.  ChaLearn looking at people: A review of events and resources , 2017, 2017 International Joint Conference on Neural Networks (IJCNN).

[6]  Nicu Sebe,et al.  Affective multimodal human-computer interaction , 2005, ACM Multimedia.

[7]  Jinhua She,et al.  Three-Layer Weighted Fuzzy Support Vector Regression for Emotional Intention Understanding in Human–Robot Interaction , 2018, IEEE Transactions on Fuzzy Systems.

[8]  Sergio Escalera,et al.  ChaLearn Looking at People RGB-D Isolated and Continuous Datasets for Gesture Recognition , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[9]  Xiaodong Yang,et al.  EigenJoints-based action recognition using Naïve-Bayes-Nearest-Neighbor , 2012, 2012 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops.

[10]  V. L. Rozaliev,et al.  Automated Identification of Human Emotions by Gestures and Poses , 2013, 2013 BRICS Congress on Computational Intelligence and 11th Brazilian Congress on Computational Intelligence.

[11]  Xiaodong Yang,et al.  Effective 3D action recognition using EigenJoints , 2014, J. Vis. Commun. Image Represent..

[12]  Masato Akagi,et al.  Cross-lingual speech emotion recognition system based on a three-layer model for human perception , 2013, 2013 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference.

[13]  Ling Shao,et al.  Learning Discriminative Representations from RGB-D Video Data , 2013, IJCAI.