Multi-view human action recognition using 2D motion templates based on MHIs and their HOG description

In this study, a new multi-view human action recognition approach is proposed by exploiting low-dimensional motion information of actions. Before feature extraction, pre-processing steps are performed to remove noise from silhouettes, incurred due to imperfect, but realistic segmentation. Two-dimensional motion templates based on motion history image (MHI) are computed for each view/action video. Histograms of oriented gradients (HOGs) are used as an efficient description of the MHIs which are classified using nearest neighbor (NN) classifier. As compared with existing approaches, the proposed method has three advantages: (i) does not require a fixed number of cameras setup during training and testing stages hence missing camera-views can be tolerated, (ii) requires less memory and bandwidth requirements and hence (iii) is computationally efficient which makes it suitable for real-time action recognition. As far as the authors know, this is the first report of results on the MuHAVi-uncut dataset having a large number of action categories and a large set of camera-views with noisy silhouettes which can be used by future workers as a baseline to improve on. Experimentation results on multi-view with this dataset gives a high-accuracy rate of 95.4% using leave-one-sequence-out cross-validation technique and compares well to similar state-of-the-art approaches.

[1]  Mohan M. Trivedi,et al.  Human action recognition using multiple views: a comparative perspective on recent developments , 2011, J-HGBU '11.

[2]  Jean-Christophe Nebel,et al.  Structural Laplacian Eigenmaps for Modeling Sets of Multivariate Sequences , 2014, IEEE Transactions on Cybernetics.

[3]  Binlong Li,et al.  Cross-view activity recognition using Hankelets , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[4]  Mohiuddin Ahmad,et al.  HMM-based Human Action Recognition Using Multiview Image Sequences , 2006, 18th International Conference on Pattern Recognition (ICPR'06).

[5]  Ioannis Pitas,et al.  View indepedent human movement recognition from multi-view video exploiting a circular invariant posture representation , 2009, 2009 IEEE International Conference on Multimedia and Expo.

[6]  S. Alam,et al.  Narrowcasting for articulated privacy and attention in SIP audio conferencing , 2009 .

[7]  David G. Lowe,et al.  Distinctive Image Features from Scale-Invariant Keypoints , 2004, International Journal of Computer Vision.

[8]  Rémi Ronfard,et al.  A survey of vision-based methods for action representation, segmentation and recognition , 2011, Comput. Vis. Image Underst..

[9]  Lihi Zelnik-Manor,et al.  Viewpoint Selection for Human Actions , 2012, International Journal of Computer Vision.

[10]  Ruonan Li,et al.  Discriminative virtual views for cross-view action recognition , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[11]  Christian Bauckhage,et al.  Action recognition by learning discriminative key poses , 2011, 2011 IEEE International Conference on Computer Vision Workshops (ICCV Workshops).

[12]  Alexandros André Chaaraoui,et al.  Silhouette-based human action recognition using sequences of key poses , 2013, Pattern Recognit. Lett..

[13]  J.K. Aggarwal,et al.  Human activity analysis , 2011, ACM Comput. Surv..

[14]  Demetri Terzopoulos,et al.  Surveillance camera scheduling: a virtual vision approach , 2005, Multimedia Systems.

[15]  Jiebo Luo,et al.  Recognizing realistic actions from videos , 2009, CVPR.

[16]  Mubarak Shah,et al.  Learning 4D action feature models for arbitrary view action recognition , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[17]  Sergio A. Velastin,et al.  Evaluation of background subtraction algorithms using MuHAVi, a multicamera human action video dataset , 2014 .

[18]  Ioannis Pitas,et al.  3D Human Action Recognition for Multi-view Camera Systems , 2011, 2011 International Conference on 3D Imaging, Modeling, Processing, Visualization and Transmission.

[19]  Andrew Zisserman,et al.  Tracking People by Learning Their Appearance , 2007 .

[20]  Chin-Pan Huang,et al.  Human Action Recognition Using Histogram of Oriented Gradient of Motion History Image , 2011, 2011 First International Conference on Instrumentation, Measurement, Computer, Communication and Control.

[21]  Alexandros Iosifidis,et al.  Multi-view action recognition based on action volumes, fuzzy distances and cluster discriminant analysis , 2013, Signal Process..

[22]  Magdalene Marinaki,et al.  A comparison of several nearest neighbor classifier metrics using Tabu Search algorithm for the feature selection problem , 2008, Optim. Lett..

[23]  Barbara Caputo,et al.  Recognizing human actions: a local SVM approach , 2004, ICPR 2004.

[24]  Honghai Liu,et al.  Advances in View-Invariant Human Motion Analysis: A Review , 2010, IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews).

[25]  Sergio A. Velastin,et al.  Automatic Segmentation and Recognition of Human Actions in Monocular Sequences , 2014, 2014 22nd International Conference on Pattern Recognition.

[26]  James W. Davis,et al.  The Recognition of Human Movement Using Temporal Templates , 2001, IEEE Trans. Pattern Anal. Mach. Intell..

[27]  Zezhi Chen,et al.  Self-adaptive Gaussian mixture model for urban traffic monitoring system , 2011, 2011 IEEE International Conference on Computer Vision Workshops (ICCV Workshops).

[28]  Cordelia Schmid,et al.  Action recognition by dense trajectories , 2011, CVPR 2011.

[29]  Qingshan Liu,et al.  Recognizing expressions from face and body gesture by temporal normalized motion and appearance features , 2013, Image Vis. Comput..

[30]  Peter E. Hart,et al.  Nearest neighbor pattern classification , 1967, IEEE Trans. Inf. Theory.

[31]  Rémi Ronfard,et al.  Free viewpoint action recognition using motion history volumes , 2006, Comput. Vis. Image Underst..

[32]  Amir F. Atiya,et al.  A Novel Template Reduction Approach for the $K$-Nearest Neighbor Method , 2009, IEEE Transactions on Neural Networks.

[33]  Alexandros Iosifidis,et al.  Multi-view Human Action Recognition: A Survey , 2013, 2013 Ninth International Conference on Intelligent Information Hiding and Multimedia Signal Processing.

[34]  Md. Atiqur Rahman Ahad,et al.  Pedestrian activity classification using patterns of motion and histogram of oriented gradient , 2016, Journal on Multimodal User Interfaces.

[35]  Bill Triggs,et al.  Histograms of oriented gradients for human detection , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[36]  Md. Atiqur Rahman Ahad,et al.  Motion history image: its variants and applications , 2012, Machine Vision and Applications.

[37]  Alexandros Iosifidis,et al.  View-Invariant Action Recognition Based on Artificial Neural Networks , 2012, IEEE Transactions on Neural Networks and Learning Systems.

[38]  Ling Shao,et al.  Multi-view action recognition using local similarity random forests and sensor fusion , 2013, Pattern Recognit. Lett..

[39]  Ronald Poppe,et al.  A survey on vision-based human action recognition , 2010, Image Vis. Comput..

[40]  Hossein Ragheb,et al.  MuHAVi: A Multicamera Human Action Video Dataset for the Evaluation of Action Recognition Methods , 2010, 2010 7th IEEE International Conference on Advanced Video and Signal Based Surveillance.