Skeleton-based action recognition with extreme learning machines

Action and gesture recognition from motion capture and RGB-D camera sequences has recently emerged as a renowned and challenging research topic. The current methods can usually be applied only to small datasets with a dozen or so different actions, and the systems often require large amounts of time to train the models and to classify new sequences. In this paper, we first extract simple but effective frame-level features from the skeletal data and build a recognition system based on the extreme learning machine. We then propose three modeling methods for post-processing the classification outputs to obtain the recognition results on the action sequence level. We test the proposed method on three public datasets ranging from 11 to 40 action classes. For all datasets, the method can classify the sequences with accuracies reaching 96-99% and with the average classification time for one sequence on a single computer core around 4ms. Fast training and testing and the high accuracy make the proposed method readily applicable for online recognition applications.

[1]  Yang Yang,et al.  Automated Recognition of Sequential Patterns in Captured Motion Streams , 2010, WAIM.

[2]  Hyunsook Chung,et al.  Conditional random field-based gesture recognition with depth information , 2013 .

[3]  Ying Wu,et al.  Mining actionlet ensemble for action recognition with depth cameras , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[4]  Chih-Jen Lin,et al.  LIBSVM: A library for support vector machines , 2011, TIST.

[5]  Markus Koskela,et al.  Classification of RGB-D and Motion Capture Sequences Using Extreme Learning Machine , 2013, SCIA.

[6]  Mario Fernando Montenegro Campos,et al.  Distance matrices as invariant features for classifying MoCap data , 2012, Proceedings of the 21st International Conference on Pattern Recognition (ICPR2012).

[7]  Meinard Müller,et al.  Motion templates for automatic classification and retrieval of motion capture data , 2006, SCA '06.

[8]  Stefano Soatto,et al.  Flexible Dictionaries for Action Classification , 2008 .

[9]  Q. M. Jonathan Wu,et al.  Human action recognition using extreme learning machine based on visual vocabularies , 2010, Neurocomputing.

[10]  Andrew Zisserman,et al.  Efficient Additive Kernels via Explicit Feature Maps , 2012, IEEE Trans. Pattern Anal. Mach. Intell..

[11]  Janusz Konrad,et al.  A gesture-driven computer interface using Kinect , 2012, 2012 IEEE Southwest Symposium on Image Analysis and Interpretation.

[12]  Johan A. K. Suykens,et al.  Least Squares Support Vector Machine Classifiers , 1999, Neural Processing Letters.

[13]  Yi Lin Efficient Motion Search in Large Motion Capture Databases , .

[14]  Jake K. Aggarwal,et al.  Human detection using depth information by Kinect , 2011, CVPR 2011 WORKSHOPS.

[15]  Tido Röder,et al.  Documentation Mocap Database HDM05 , 2007 .

[16]  Ruzena Bajcsy,et al.  Berkeley MHAD: A comprehensive Multimodal Human Action Database , 2013, 2013 IEEE Workshop on Applications of Computer Vision (WACV).

[17]  Andrew W. Fitzgibbon,et al.  Real-time human pose recognition in parts from single depth images , 2011, CVPR 2011.

[18]  ZhangRui,et al.  Extreme Learning Machine for Regression and Multiclass Classification , 2012 .

[19]  Hahn-Ming Lee,et al.  Recognition of Human Actions Using Motion Capture Data and Support Vector Machine , 2009, 2009 WRI World Congress on Software Engineering.

[20]  Jessica K. Hodgins,et al.  Action capture with accelerometers , 2008, SCA '08.

[21]  Christian Ritz,et al.  Motion classification using Dynamic Time Warping , 2008, 2008 IEEE 10th Workshop on Multimedia Signal Processing.

[22]  Jorma Laaksonen,et al.  Large-scale visual concept detection with explicit kernel maps and power mean SVM , 2013, ICMR.

[23]  Atsushi Shimada,et al.  Gesture recognition using sparse code of Hierarchical SOM , 2008, 2008 19th International Conference on Pattern Recognition.

[24]  Chee Kheong Siew,et al.  Extreme learning machine: Theory and applications , 2006, Neurocomputing.

[25]  Arno Zinke,et al.  Fast local and global similarity searches in large motion capture databases , 2010, SCA '10.

[26]  Mathieu Barnachon,et al.  A real-time system for motion retrieval and interpretation , 2013, Pattern Recognit. Lett..

[27]  S. Mitra,et al.  Gesture Recognition: A Survey , 2007, IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews).

[29]  Hongming Zhou,et al.  Extreme Learning Machine for Regression and Multiclass Classification , 2012, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).

[30]  Helena M. Mentis,et al.  Instructing people for training gestural interactive systems , 2012, CHI.

[31]  Erkki Oja,et al.  GPU-accelerated and parallelized ELM ensembles for large-scale regression , 2011, Neurocomputing.

[32]  Steve C. Maddock,et al.  Motion Capture File Formats Explained , 2001 .

[33]  Masashi Sugiyama,et al.  Importance-weighted least-squares probabilistic classifier for covariate shift adaptation with application to human activity recognition , 2012, Neurocomputing.

[34]  Zicheng Liu,et al.  Expandable Data-Driven Graphical Modeling of Human Actions Based on Salient Postures , 2008, IEEE Transactions on Circuits and Systems for Video Technology.

[35]  Xin Zhao,et al.  Human action recognition based on semi-supervised discriminant analysis with global constraint , 2013, Neurocomputing.

[36]  Víctor González-Pacheco,et al.  Integration of a low-cost RGB-D sensor in a social robot for gesture recognition , 2011, 2011 6th ACM/IEEE International Conference on Human-Robot Interaction (HRI).

[37]  Alberto Menache,et al.  Understanding Motion Capture for Computer Animation and Video Games , 1999 .

[38]  Darko Kirovski,et al.  Real-time classification of dance gestures from skeleton animation , 2011, SCA '11.

[39]  Chih-Jen Lin,et al.  LIBLINEAR: A Library for Large Linear Classification , 2008, J. Mach. Learn. Res..

[40]  中村 薫,et al.  KINECT for Windows SDKプログラミング , 2012 .

[41]  BlakeAndrew,et al.  Real-time human pose recognition in parts from single depth images , 2013 .