Shedding Light on People Action Recognition in Social Robotics by Means of Common Spatial Patterns

Action recognition in robotics is a research field that has gained momentum in recent years. In this work, a video activity recognition method is presented, which has the ultimate goal of endowing a robot with action recognition capabilities for a more natural social interaction. The application of Common Spatial Patterns (CSP), a signal processing approach widely used in electroencephalography (EEG), is presented in a novel manner to be used in activity recognition in videos taken by a humanoid robot. A sequence of skeleton data is considered as a multidimensional signal and filtered according to the CSP algorithm. Then, characteristics extracted from these filtered data are used as features for a classifier. A database with 46 individuals performing six different actions has been created to test the proposed method. The CSP-based method along with a Linear Discriminant Analysis (LDA) classifier has been compared to a Long Short-Term Memory (LSTM) neural network, showing that the former obtains similar or better results than the latter, while being simpler.

[1]  Fei-Fei Li,et al.  Large-Scale Video Classification with Convolutional Neural Networks , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[2]  C. Scogings,et al.  An Investigation of Skeleton-Based Optical Flow-Guided Features for 3D Action Recognition Using a Multi-Stream CNN Model , 2018, 2018 IEEE 3rd International Conference on Image, Vision and Computing (ICIVC).

[3]  Ping Xue,et al.  Sub-band Common Spatial Pattern (SBCSP) for Brain-Computer Interface , 2007, 2007 3rd International IEEE/EMBS Conference on Neural Engineering.

[4]  James W. Davis,et al.  The Recognition of Human Movement Using Temporal Templates , 2001, IEEE Trans. Pattern Anal. Mach. Intell..

[5]  Jitendra Malik,et al.  Recognizing action at a distance , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[6]  Takeo Kanade,et al.  An Iterative Image Registration Technique with an Application to Stereo Vision , 1981, IJCAI.

[7]  Zhe Wang,et al.  Towards Good Practices for Very Deep Two-Stream ConvNets , 2015, ArXiv.

[8]  Cordelia Schmid,et al.  Learning Object Representations for Visual Object Class Recognition , 2007, ICCV 2007.

[9]  Pietro Perona,et al.  Microsoft COCO: Common Objects in Context , 2014, ECCV.

[10]  Cordelia Schmid,et al.  Beyond Bags of Features: Spatial Pyramid Matching for Recognizing Natural Scene Categories , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[11]  PorikliFatih,et al.  Going deeper into action recognition , 2017 .

[12]  Andrew Zisserman,et al.  Representing shape with a spatial pyramid kernel , 2007, CIVR '07.

[13]  Limin Wang,et al.  Action recognition with trajectory-pooled deep-convolutional descriptors , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[14]  Keinosuke Fukunaga,et al.  Application of the Karhunen-Loève Expansion to Feature Selection and Ordering , 1970, IEEE Trans. Computers.

[15]  Juan Carlos Niebles,et al.  A Hierarchical Model of Shape and Appearance for Human Action Classification , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[16]  Andrew Zisserman,et al.  Two-Stream Convolutional Networks for Action Recognition in Videos , 2014, NIPS.

[17]  Sung Wook Baik,et al.  Action Recognition in Video Sequences using Deep Bi-Directional LSTM With CNN Features , 2018, IEEE Access.

[18]  Basilio Sierra,et al.  Dynamic selection of the best base classifier in One versus One , 2015, Knowl. Based Syst..

[19]  Yaser Sheikh,et al.  OpenPose: Realtime Multi-Person 2D Pose Estimation Using Part Affinity Fields , 2018, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[20]  R. Fisher THE USE OF MULTIPLE MEASUREMENTS IN TAXONOMIC PROBLEMS , 1936 .

[21]  J. Aggarwal,et al.  Recognizing human action from a far field of view , 2009, 2009 Workshop on Motion and Video Computing (WMVC).

[22]  Ronald Poppe,et al.  A survey on vision-based human action recognition , 2010, Image Vis. Comput..

[23]  Ferda Nur Alpaslan,et al.  Video Action Recognition Using an Optical Flow Based Representation , 2022 .

[24]  Zicheng Liu,et al.  HON4D: Histogram of Oriented 4D Normals for Activity Recognition from Depth Sequences , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[25]  Supavadee Aramvith,et al.  Human action recognition using direction histograms of optical flow , 2011, 2011 11th International Symposium on Communications & Information Technologies (ISCIT).

[26]  Javier Muguerza,et al.  User Adapted Motor-Imaginary Brain-Computer Interface by means of EEG Channel Selection Based on Estimation of Distributed Algorithms , 2016 .

[27]  Cordelia Schmid,et al.  Local Features and Kernels for Classification of Texture and Object Categories: A Comprehensive Study , 2006, 2006 Conference on Computer Vision and Pattern Recognition Workshop (CVPRW'06).

[28]  Cordelia Schmid,et al.  Learning realistic human actions from movies , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[29]  Cordelia Schmid,et al.  PoTion: Pose MoTion Representation for Action Recognition , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[30]  Mehrtash Tafazzoli Harandi,et al.  Going deeper into action recognition: A survey , 2016, Image Vis. Comput..

[31]  Jenq-Neng Hwang,et al.  A Review on Video-Based Human Activity Recognition , 2013, Comput..

[32]  Andrew Zisserman,et al.  Convolutional Two-Stream Network Fusion for Video Action Recognition , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[33]  Anupam Agrawal,et al.  A survey on activity recognition and behavior understanding in video surveillance , 2012, The Visual Computer.

[34]  Sachin S. Talathi,et al.  Improving performance of recurrent neural network with relu nonlinearity , 2015, ArXiv.

[35]  Yoshua Bengio,et al.  Gradient Flow in Recurrent Nets: the Difficulty of Learning Long-Term Dependencies , 2001 .

[36]  Xiaodong Yang,et al.  Recognizing actions using depth motion maps-based histograms of oriented gradients , 2012, ACM Multimedia.

[37]  G. Pfurtscheller,et al.  Optimal spatial filtering of single trial EEG during imagined hand movement. , 2000, IEEE transactions on rehabilitation engineering : a publication of the IEEE Engineering in Medicine and Biology Society.

[38]  Gregory D. Hager,et al.  Histograms of oriented optical flow and Binet-Cauchy kernels on nonlinear dynamical systems for the recognition of human actions , 2009, CVPR.

[39]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[40]  H. Hotelling Analysis of a complex of statistical variables into principal components. , 1933 .

[41]  Du Tran,et al.  Human Activity Recognition with Metric Learning , 2008, ECCV.

[42]  Barbara Caputo,et al.  Recognizing human actions: a local SVM approach , 2004, Proceedings of the 17th International Conference on Pattern Recognition, 2004. ICPR 2004..

[43]  Hong Liu,et al.  Robust 3D Action Recognition Through Sampling Local Appearances and Global Distributions , 2018, IEEE Transactions on Multimedia.

[44]  Yoshua Bengio,et al.  Learning long-term dependencies with gradient descent is difficult , 1994, IEEE Trans. Neural Networks.

[45]  Jing Tian,et al.  Action recognition using multi-directional projected depth motion maps , 2018, Journal of Ambient Intelligence and Humanized Computing.

[46]  James W. Davis,et al.  An appearance-based representation of action , 1996, Proceedings of 13th International Conference on Pattern Recognition.

[47]  Yijun Wang,et al.  Common Spatial Pattern Method for Channel Selelction in Motor Imagery Based Brain-computer Interface , 2005, 2005 IEEE Engineering in Medicine and Biology 27th Annual Conference.