Natural Gesture Extraction Based on Hand Trajectory

Automatic natural gesture recognition can be useful both for the development of human-robot applications and as an aid in the study of human gesture. The goal of this study is to recognize natural gestures using only an RGB video without machine learning methods. To develop and test the proposed method we recorded videos in which a speaker gestured naturally but in a controlled way. The advantage of using this method over lab-recorded data is that the data contain variations in gestures that are typically encountered when analyzing gestures of TV news or speech videos on the Internet. The hand positions are computed by a pose estimation method, and we recognize the gestures based on the hand trajectories, assuming that the gesturing hand(s) do(es) not change its direction abruptly during each phase of a gesture. Based on ground-truth annotations provided by linguistic experts, the accuracies were 92.15%, 91.76% and 75.81% for three natural gestures selected.

[1]  Sergio Escalera,et al.  ChaLearn Looking at People RGB-D Isolated and Continuous Datasets for Gesture Recognition , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[2]  A. Kendon Gesticulation and Speech: Two Aspects of the Process of Utterance , 1981 .

[3]  Geneviève Calbris,et al.  From cutting an object to a clear cut analysis: Gesture as the representation of a preconceptual schema linking concrete actions to abstract notions , 2003 .

[4]  Yaser Sheikh,et al.  OpenPose: Realtime Multi-Person 2D Pose Estimation Using Part Affinity Fields , 2018, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[5]  Yaser Sheikh,et al.  Hand Keypoint Detection in Single Images Using Multiview Bootstrapping , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[6]  Changshui Zhang,et al.  Recurrent Convolutional Neural Networks for Continuous Sign Language Recognition by Staged Optimization , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[7]  Mieke Van Herreweghe,et al.  Gesture and Sign Language Recognition with Temporal Residual Networks , 2017, 2017 IEEE International Conference on Computer Vision Workshops (ICCVW).

[8]  Sarajane Marques Peres,et al.  Gesture phase segmentation using support vector machines , 2016, Expert Syst. Appl..

[9]  Oscar Koller,et al.  SubUNets: End-to-End Hand Shape and Continuous Sign Language Recognition , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[10]  Kazuhiro Otsuka,et al.  Recognizing Words from Gestures: Discovering Gesture Descriptors Associated with Spoken Utterances , 2017, 2017 12th IEEE International Conference on Automatic Face & Gesture Recognition (FG 2017).

[11]  Hennie Brugman,et al.  Annotating Multi-media/Multi-modal Resources with ELAN , 2004, LREC.