Online RGB-D gesture recognition with extreme learning machines

Gesture recognition is needed in many applications such as human-computer interaction and sign language recognition. The challenges of building an actual recognition system do not lie only in reaching an acceptable recognition accuracy but also with requirements for fast online processing. In this paper, we propose a method for online gesture recognition using RGB-D data from a Kinect sensor. Frame-level features are extracted from RGB frames and the skeletal model obtained from the depth data, and then classified by multiple extreme learning machines. The outputs from the classifiers are aggregated to provide the final classification results for the gestures. We test our method on the ChaLearn multi-modal gesture challenge data. The results of the experiments demonstrate that the method can perform effective multi-class gesture recognition in real-time.

[1]  Jin-Hyung Kim,et al.  An HMM-Based Threshold Model Approach for Gesture Recognition , 1999, IEEE Trans. Pattern Anal. Mach. Intell..

[2]  Alberto Menache,et al.  Understanding Motion Capture for Computer Animation and Video Games , 1999 .

[3]  Bill Triggs,et al.  Histograms of oriented gradients for human detection , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[4]  Meinard Müller,et al.  Motion templates for automatic classification and retrieval of motion capture data , 2006, SCA '06.

[5]  Chee Kheong Siew,et al.  Extreme learning machine: Theory and applications , 2006, Neurocomputing.

[6]  S. Mitra,et al.  Gesture Recognition: A Survey , 2007, IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews).

[7]  Cordelia Schmid,et al.  Learning realistic human actions from movies , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[8]  Zicheng Liu,et al.  Expandable Data-Driven Graphical Modeling of Human Actions Based on Salient Postures , 2008, IEEE Transactions on Circuits and Systems for Video Technology.

[9]  David A. McAllester,et al.  Object Detection with Discriminatively Trained Part Based Models , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[10]  Q. M. Jonathan Wu,et al.  Human action recognition using extreme learning machine based on visual vocabularies , 2010, Neurocomputing.

[11]  Heung-Il Suk,et al.  Hand gesture recognition based on dynamic Bayesian network framework , 2010, Pattern Recognit..

[12]  Andrea Vedaldi,et al.  Vlfeat: an open and portable library of computer vision algorithms , 2010, ACM Multimedia.

[13]  Ian Reid,et al.  fastHOG – a real-time GPU implementation of HOG , 2011 .

[14]  Jake K. Aggarwal,et al.  Human detection using depth information by Kinect , 2011, CVPR 2011 WORKSHOPS.

[15]  Darko Kirovski,et al.  Real-time classification of dance gestures from skeleton animation , 2011, SCA '11.

[16]  Xiaodong Yang,et al.  EigenJoints-based action recognition using Naïve-Bayes-Nearest-Neighbor , 2012, 2012 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops.

[17]  Ying Wu,et al.  Mining actionlet ensemble for action recognition with depth cameras , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[18]  Xiaodong Yang,et al.  Recognizing actions using depth motion maps-based histograms of oriented gradients , 2012, ACM Multimedia.

[19]  Jake K. Aggarwal,et al.  View invariant human action recognition using histograms of 3D joints , 2012, 2012 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops.

[20]  Hongming Zhou,et al.  Extreme Learning Machine for Regression and Multiclass Classification , 2012, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).

[21]  Isabelle Guyon,et al.  ChaLearn gesture challenge: Design and first results , 2012, 2012 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops.

[22]  Yale Song,et al.  Continuous body and hand gesture recognition for natural human-computer interaction , 2012, TIIS.

[23]  Yui Man Lui,et al.  Human gesture recognition on product manifolds , 2012, J. Mach. Learn. Res..

[24]  Isabelle Guyon,et al.  Results and Analysis of the ChaLearn Gesture Challenge 2012 , 2012, WDIA.

[25]  Markus Koskela,et al.  Classification of RGB-D and Motion Capture Sequences Using Extreme Learning Machine , 2013, SCIA.

[26]  Zicheng Liu,et al.  HON4D: Histogram of Oriented 4D Normals for Activity Recognition from Depth Sequences , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[27]  Richard Bowden,et al.  Hollywood 3D: Recognizing Actions in 3D Natural Scenes , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[28]  Hyunsook Chung,et al.  Conditional random field-based gesture recognition with depth information , 2013 .

[29]  Ruzena Bajcsy,et al.  Berkeley MHAD: A comprehensive Multimodal Human Action Database , 2013, 2013 IEEE Workshop on Applications of Computer Vision (WACV).

[30]  Venu Govindaraju,et al.  Language-motivated approaches to action recognition , 2013, J. Mach. Learn. Res..

[31]  Andrew W. Fitzgibbon,et al.  Real-time human pose recognition in parts from single depth images , 2011, CVPR 2011.