Gesture Recognition in RGB Videos UsingHuman Body Keypoints and Dynamic Time Warping

Gesture recognition opens up new ways for humans to intuitively interact with machines. Especially for service robots, gestures can be a valuable addition to the means of communication to, for example, draw the robot's attention to someone or something. Extracting a gesture from video data and classifying it is a challenging task and a variety of approaches have been proposed throughout the years. This paper presents a method for gesture recognition in RGB videos using OpenPose to extract the pose of a person and Dynamic Time Warping (DTW) in conjunction with One-Nearest-Neighbor (1NN) for time-series classification. The main features of this approach are the independence of any specific hardware and high flexibility, because new gestures can be added to the classifier by adding only a few examples of it. We utilize the robustness of the Deep Learning-based OpenPose framework while avoiding the data-intensive task of training a neural network ourselves. We demonstrate the classification performance of our method using a public dataset.

[1]  D. Ghosh,et al.  Trajectory modeling in gesture recognition using CyberGloves/sup /spl reg// and magnetic trackers , 2004, 2004 IEEE Region 10 Conference TENCON 2004..

[2]  G. W. Hughes,et al.  Minimum Prediction Residual Principle Applied to Speech Recognition , 1975 .

[3]  Varun Ramakrishna,et al.  Convolutional Pose Machines , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[4]  Dietrich Paulus,et al.  Gesture Recognition On Human Pose Features Of Single Images , 2018, 2018 International Conference on Intelligent Systems (IS).

[5]  Sergio Escalera,et al.  Featureweighting in dynamic timewarping for gesture recognition in depth data , 2011, 2011 IEEE International Conference on Computer Vision Workshops (ICCV Workshops).

[6]  Tarik Arici,et al.  Gesture Recognition using Skeleton Data with Weighted Dynamic Time Warping , 2013, VISAPP.

[7]  Pavlo Molchanov,et al.  Online Detection and Classification of Dynamic Hand Gestures with Recurrent 3D Convolutional Neural Networks , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[8]  Eamonn J. Keogh,et al.  Making Time-Series Classification More Accurate Using Learned Constraints , 2004, SDM.

[9]  Yaser Sheikh,et al.  Hand Keypoint Detection in Single Images Using Multiview Bootstrapping , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[10]  Eamonn J. Keogh,et al.  The great time series classification bake off: a review and experimental evaluation of recent algorithmic advances , 2016, Data Mining and Knowledge Discovery.

[11]  Yaser Sheikh,et al.  OpenPose: Realtime Multi-Person 2D Pose Estimation Using Part Affinity Fields , 2018, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[12]  Philip Chan,et al.  Toward accurate dynamic time warping in linear time and space , 2007, Intell. Data Anal..

[13]  Meinard Müller,et al.  Information retrieval for music and motion , 2007 .

[14]  Mariusz Oszust,et al.  An Approach to Gesture Recognition with Skeletal Data Using Dynamic Time Warping and Nearest Neighbour Classifier , 2016 .

[15]  Yang Gao,et al.  Multi-layered gesture recognition with Kinect , 2015, J. Mach. Learn. Res..

[16]  Hyo-Rim Choi,et al.  A Differential Evolution Approach to Optimize Weights of Dynamic Time Warping for Multi-Sensor Based Gesture Recognition , 2019, Sensors.

[17]  S. Mitra,et al.  Gesture Recognition: A Survey , 2007, IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews).

[18]  Xilin Chen,et al.  Two streams Recurrent Neural Networks for Large-Scale Continuous Gesture Recognition , 2016, 2016 23rd International Conference on Pattern Recognition (ICPR).

[19]  M. Reinders,et al.  Multi-Dimensional Dynamic Time Warping for Gesture Recognition , 2007 .

[20]  Nasser Kehtarnavaz,et al.  UTD-MHAD: A multimodal dataset for human action recognition utilizing a depth camera and a wearable inertial sensor , 2015, 2015 IEEE International Conference on Image Processing (ICIP).

[21]  Sergio Escalera,et al.  Probability-Based Dynamic Time Warping for Gesture Recognition on RGB-D Data , 2012, WDIA.

[22]  Li Wei,et al.  Fast time series classification using numerosity reduction , 2006, ICML.

[23]  Eamonn J. Keogh,et al.  Derivative Dynamic Time Warping , 2001, SDM.

[24]  Ana M. Barbancho,et al.  Fast-gesture recognition and classification using Kinect: an application for a virtual reality drumkit , 2015, Multimedia Tools and Applications.

[25]  S. Chiba,et al.  Dynamic programming algorithm optimization for spoken word recognition , 1978 .

[26]  Pavel Senin,et al.  Dynamic Time Warping Algorithm Review , 2008 .

[27]  M. Sile O'Modhrain,et al.  Recognition Of Multivariate Temporal Musical Gestures Using N-Dimensional Dynamic Time Warping , 2011, NIME.

[28]  Lihui Wang,et al.  Gesture recognition for human-robot collaboration: A review , 2017, International Journal of Industrial Ergonomics.