A discussion on the validation tests employed to compare human action recognition methods using the MSR Action3D dataset

This paper aims to determine which is the best human action recognition method based on features extracted from RGB-D devices, such as the Microsoft Kinect. A review of all the papers that make reference to MSR Action3D, the most used dataset that includes depth information acquired from a RGB-D device, has been performed. We found that the validation method used by each work differs from the others. So, a direct comparison among works cannot be made. However, almost all the works present their results comparing them without taking into account this issue. Therefore, we present different rankings according to the methodology used for the validation in orden to clarify the existing confusion.

[1]  Mario Fernando Montenegro Campos,et al.  STOP: Space-Time Occupancy Patterns for 3D Action Recognition from Depth Map Sequences , 2012, CIARP.

[2]  Z. Liu,et al.  A real time system for dynamic hand gesture recognition with a depth sensor , 2012, 2012 Proceedings of the 20th European Signal Processing Conference (EUSIPCO).

[3]  Nasser Kehtarnavaz,et al.  Action Recognition from Depth Sequences Using Depth Motion Maps-Based Local Binary Patterns , 2015, 2015 IEEE Winter Conference on Applications of Computer Vision.

[4]  Juan Carlos Niebles,et al.  Discriminative Hierarchical Modeling of Spatio-temporally Composable Human Activities , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[5]  Alois Knoll,et al.  Action recognition using ensemble weighted multi-instance learning , 2014, 2014 IEEE International Conference on Robotics and Automation (ICRA).

[6]  Alois Knoll,et al.  Combining unsupervised learning and discrimination for 3D action recognition , 2015, Signal Process..

[7]  Nasser Kehtarnavaz,et al.  Real-time human action recognition based on depth motion maps , 2016, Journal of Real-Time Image Processing.

[8]  Mathieu Barnachon,et al.  Ongoing human action recognition with motion capture , 2014, Pattern Recognit..

[9]  Andreas E. Savakis,et al.  3D Action Classification Using Sparse Spatio-temporal Feature Representations , 2012, ISVC.

[10]  Ngoc Q. Ly,et al.  An effective fusion scheme of spatio-temporal features for human action recognition in RGB-D video , 2013, 2013 International Conference on Control, Automation and Information Sciences (ICCAIS).

[11]  Quan Z. Sheng,et al.  Online human gesture recognition from motion data streams , 2013, ACM Multimedia.

[12]  Jing Zhang,et al.  Deep Convolutional Neural Networks for Action Recognition Using Depth Map Sequences , 2015, ArXiv.

[13]  Rui Yang,et al.  DMM-Pyramid Based Deep Architectures for Action Recognition with Depth Cameras , 2014, ACCV.

[14]  Ruzena Bajcsy,et al.  Bio-inspired Dynamic 3D Discriminative Skeletal Features for Human Action Recognition , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition Workshops.

[15]  Arif Mahmood,et al.  Real time action recognition using histograms of depth gradients and random decision forests , 2014, IEEE Winter Conference on Applications of Computer Vision.

[16]  Bart Selman,et al.  Unstructured human activity detection from RGBD images , 2011, 2012 IEEE International Conference on Robotics and Automation.

[17]  Hema Swetha Koppula,et al.  Learning human activities and object affordances from RGB-D videos , 2012, Int. J. Robotics Res..

[18]  Alexandros André Chaaraoui,et al.  Evolutionary joint selection to improve human action recognition with RGB-D devices , 2014, Expert Syst. Appl..

[19]  Christian Wolf,et al.  Fast Exact Hyper-graph Matching with Dynamic Programming for Spatio-temporal Data , 2014, Journal of Mathematical Imaging and Vision.

[20]  Jake K. Aggarwal,et al.  Spatio-temporal Depth Cuboid Similarity Feature for Activity Recognition Using Depth Camera , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[21]  Yiannis Demiris,et al.  Iterative temporal learning and prediction with the sparse online echo state gaussian process , 2012, The 2012 International Joint Conference on Neural Networks (IJCNN).

[22]  Zicheng Liu,et al.  HON4D: Histogram of Oriented 4D Normals for Activity Recognition from Depth Sequences , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[23]  Gioia Ballin,et al.  3D Flow Estimation for Human Action Recognition from Colored Point Clouds , 2013, BICA 2013.

[24]  Xiaodong Yang,et al.  Super Normal Vector for Activity Recognition Using Depth Sequences , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[25]  Alan L. Yuille,et al.  An Approach to Pose-Based Action Recognition , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[26]  Qi Tian,et al.  Human Daily Action Analysis with Multi-view and Color-Depth Data , 2012, ECCV Workshops.

[27]  Mario Fernando Montenegro Campos,et al.  Online gesture recognition from pose kernel learning and decision forests , 2014, Pattern Recognit. Lett..

[28]  Quan Z. Sheng,et al.  Effective approaches in human action recognition , 2013, 2013 International Conference on Advanced Computer Science and Information Systems (ICACSIS).

[29]  Hugo Jair Escalante,et al.  A One-Shot DTW-Based Method for Early Gesture Recognition , 2013, CIARP.

[30]  Mohammad H. Mahoor,et al.  Human activity recognition using multi-features and multiple kernel learning , 2014, Pattern Recognit..

[31]  Alexandros André Chaaraoui,et al.  Optimal Joint Selection for Skeletal Data from RGB-D Devices Using a Genetic Algorithm , 2012, MICAI.

[32]  Jake K. Aggarwal,et al.  View invariant human action recognition using histograms of 3D joints , 2012, 2012 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops.

[33]  Ying Wu,et al.  Learning Maximum Margin Temporal Warping for Action Recognition , 2013, 2013 IEEE International Conference on Computer Vision.

[34]  Ngoc Quoc Ly,et al.  Sparse spatio-temporal representation of joint shape-motion cues for human action recognition in depth sequences , 2013, The 2013 RIVF International Conference on Computing & Communication Technologies - Research, Innovation, and Vision for Future (RIVF).

[35]  Christian Bauckhage,et al.  Efficient Pose-Based Action Recognition , 2014, ACCV.

[36]  Ying Wu,et al.  Human Action Recognition with Depth Cameras , 2014, SpringerBriefs in Computer Science.

[37]  Georgios Evangelidis,et al.  Skeletal Quads: Human Action Recognition Using Joint Quadruples , 2014, 2014 22nd International Conference on Pattern Recognition.

[38]  Arif Mahmood,et al.  Action Classification with Locality-Constrained Linear Coding , 2014, 2014 22nd International Conference on Pattern Recognition.

[39]  Joseph J. LaViola,et al.  Exploring the Trade-off Between Accuracy and Observational Latency in Action Recognition , 2013, International Journal of Computer Vision.

[40]  Mario Fernando Montenegro Campos,et al.  Real-Time Gesture Recognition from Depth Data through Key Poses Learning and Decision Forests , 2012, 2012 25th SIBGRAPI Conference on Graphics, Patterns and Images.

[41]  Ying Wu,et al.  Mining actionlet ensemble for action recognition with depth cameras , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[42]  Mohan M. Trivedi,et al.  Joint Angles Similarities and HOG2 for Action Recognition , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition Workshops.

[43]  Ying Wu,et al.  Robust 3D Action Recognition with Random Occupancy Patterns , 2012, ECCV.

[44]  Christophe Garcia,et al.  Human activities dataset and the ICPR 2012 human activities recognition and localization competition , 2012 .

[45]  Arif Mahmood,et al.  HOPC: Histogram of Oriented Principal Components of 3D Pointclouds for Action Recognition , 2014, ECCV.

[46]  Alberto Del Bimbo,et al.  Recognizing Actions from Depth Cameras as Weakly Aligned Multi-part Bag-of-Poses , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition Workshops.

[47]  Shuxin Qin,et al.  Gesture recognition from depth images using motion and shape features , 2013, 2013 2nd International Symposium on Instrumentation and Measurement, Sensor Network and Automation (IMSNA).

[48]  Dimitris Kastaniotis,et al.  Pose-based human action recognition via sparse representation in dissimilarity space , 2014, J. Vis. Commun. Image Represent..

[49]  Shuicheng Yan,et al.  Body Surface Context: A New Robust Feature for Action Recognition From Depth Videos , 2014, IEEE Transactions on Circuits and Systems for Video Technology.

[50]  Wanqing Li,et al.  Action recognition based on a bag of 3D points , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition - Workshops.

[51]  Yanbing Xue,et al.  Human Action Recognition Via Multi-modality Information , 2014 .

[52]  Jun Yu,et al.  Machine learning and signal processing for human pose recovery and behavior analysis , 2015, Signal Process..

[53]  Junsong Yuan,et al.  Learning Actionlet Ensemble for 3D Human Action Recognition , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[54]  Lihong Zheng,et al.  Three Dimensional Motion Trail Model for Gesture Recognition , 2013, 2013 IEEE International Conference on Computer Vision Workshops.

[55]  Guangping Xu,et al.  Human Behavior Recognition Based on Axonometric Projections and PHOG Feature , 2014 .

[56]  Marco Morana,et al.  Motion sensors for activity recognition in an ambient-intelligence scenario , 2013, 2013 IEEE International Conference on Pervasive Computing and Communications Workshops (PERCOM Workshops).

[57]  Helena M. Mentis,et al.  Instructing people for training gestural interactive systems , 2012, CHI.

[58]  Alexandros André Chaaraoui,et al.  Fusion of Skeletal and Silhouette-Based Features for Human Action Recognition with RGB-D Devices , 2013, 2013 IEEE International Conference on Computer Vision Workshops.

[59]  Nikos Nikolaidis,et al.  Action recognition on motion capture data using a dynemes and forward differences representation , 2014, J. Vis. Commun. Image Represent..

[60]  Peter Carr,et al.  Hybrid robotic/virtual pan-tilt-zom cameras for autonomous event recording , 2013, ACM Multimedia.

[61]  Pavan K. Turaga,et al.  Attractor-Shape for Dynamical Analysis of Human Movement: Applications in Stroke Rehabilitation and Action Recognition , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition Workshops.

[62]  Marwan Torki,et al.  Histogram of Oriented Displacements (HOD): Describing Trajectories of Human Joints for Action Recognition , 2013, IJCAI.

[63]  Aytül Erçil,et al.  A Decision Forest Based Feature Selection Framework for Action Recognition from RGB-Depth Cameras , 2013, ICIAR.

[64]  Dimitrios Makris,et al.  G3D: A gaming action dataset and real time action recognition evaluation framework , 2012, 2012 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops.

[65]  Ruzena Bajcsy,et al.  Berkeley MHAD: A comprehensive Multimodal Human Action Database , 2013, 2013 IEEE Workshop on Applications of Computer Vision (WACV).

[66]  Zicheng Liu,et al.  Random Occupancy Patterns , 2014 .

[67]  Bingbing Ni,et al.  RGBD-HuDaAct: A color-depth video database for human daily activity recognition , 2011, ICCV Workshops.

[68]  Alberto Del Bimbo,et al.  Space-Time Pose Representation for 3D Human Action Recognition , 2013, ICIAP Workshops.

[69]  Rui Zhang,et al.  Human Action Recognition by Mining Discriminative Segment with Novel Skeleton Joint Feature , 2013, PCM.

[70]  Gérard G. Medioni,et al.  Home Monitoring Musculo-skeletal Disorders with a Single 3D Sensor , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition Workshops.

[71]  Hairong Qi,et al.  Spatio-temporal feature extraction and representation for RGB-D human action recognition , 2014, Pattern Recognit. Lett..

[72]  Emanuele Frontoni,et al.  Customers’ activity recognition in intelligent retail environments , 2013 .

[73]  Bülent Sankur,et al.  Graph-based analysis of physical exercise actions , 2013, MIIRH '13.

[74]  Jun Kong,et al.  Informative joints based human action recognition using skeleton contexts , 2015, Signal Process. Image Commun..

[75]  Cristian Sminchisescu,et al.  The Moving Pose: An Efficient 3D Kinematics Descriptor for Low-Latency Action Recognition and Detection , 2013, 2013 IEEE International Conference on Computer Vision.

[76]  Ruzena Bajcsy,et al.  Sequence of the Most Informative Joints (SMIJ): A new representation for human skeletal action recognition , 2012, 2012 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops.

[77]  Andreas E. Savakis,et al.  Grassmannian Sparse Representations and Motion Depth Surfaces for 3D Action Recognition , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition Workshops.

[78]  Xiaodong Yang,et al.  Effective 3D action recognition using EigenJoints , 2014, J. Vis. Commun. Image Represent..

[79]  Guodong Guo,et al.  Fusing Spatiotemporal Features and Joints for 3D Action Recognition , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition Workshops.

[80]  Marwan Torki,et al.  Human Action Recognition Using a Temporal Hierarchy of Covariance Descriptors on 3D Joint Locations , 2013, IJCAI.

[81]  Mario Fernando Montenegro Campos,et al.  On the improvement of human action recognition from depth map sequences using Space-Time Occupancy Patterns , 2014, Pattern Recognit. Lett..

[82]  Hairong Qi,et al.  Group Sparsity and Geometry Constrained Dictionary Learning for Action Recognition from Depth Maps , 2013, 2013 IEEE International Conference on Computer Vision.