Real-time action recognition using a multilayer descriptor with variable size

Abstract. Video analysis technology has become less expensive and more powerful in terms of storage resources and resolution capacity, promoting progress in a wide range of applications. Video-based human action detection has been used for several tasks in surveillance environments, such as forensic investigation, patient monitoring, medical training, accident prevention, and traffic monitoring, among others. We present a method for action identification based on adaptive training of a multilayer descriptor applied to a single classifier. Cumulative motion shapes (CMSs) are extracted according to the number of frames present in the video. Each CMS is employed as a self-sufficient layer in the training stage but belongs to the same descriptor. A robust classification is achieved through individual responses of classifiers for each layer, and the dominant result is used as a final outcome. Experiments are conducted on five public datasets (Weizmann, KTH, MuHAVi, IXMAS, and URADL) to demonstrate the effectiveness of the method in terms of accuracy in real time.

[1]  P. KaewTrakulPong,et al.  An Improved Adaptive Background Mixture Model for Real-time Tracking with Shadow Detection , 2002 .

[2]  Yan Song,et al.  Describing Trajectory of Surface Patch for Human Action Recognition on RGB and Depth Videos , 2015, IEEE Signal Processing Letters.

[3]  Christopher Joseph Pal,et al.  Activity recognition using the velocity histories of tracked keypoints , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[4]  Janusz Konrad,et al.  Action Recognition From Video Using Feature Covariance Matrices , 2013, IEEE Transactions on Image Processing.

[5]  Shaogang Gong,et al.  Fusing appearance and distribution information of interest points for action recognition , 2012, Pattern Recognit..

[6]  Patrick Pérez,et al.  Joint pose estimation and action recognition in image graphs , 2011, 2011 18th IEEE International Conference on Image Processing.

[7]  Ronen Basri,et al.  Actions as Space-Time Shapes , 2007, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[8]  Tinne Tuytelaars,et al.  Modeling video evolution for action recognition , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[9]  Serge J. Belongie,et al.  Behavior recognition via sparse spatio-temporal features , 2005, 2005 IEEE International Workshop on Visual Surveillance and Performance Evaluation of Tracking and Surveillance.

[10]  James W. Davis,et al.  The Recognition of Human Movement Using Temporal Templates , 2001, IEEE Trans. Pattern Anal. Mach. Intell..

[11]  Massimo Piccardi,et al.  Histogram-Based Training Initialisation of Hidden Markov Models for Human Action Recognition , 2010, 2010 7th IEEE International Conference on Advanced Video and Signal Based Surveillance.

[12]  Ali Farhadi,et al.  Learning to Recognize Activities from the Wrong View Point , 2008, ECCV.

[13]  Naftali Tishby,et al.  The information bottleneck method , 2000, ArXiv.

[14]  Tieniu Tan,et al.  A survey on visual surveillance of object motion and behaviors , 2004, IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews).

[15]  Hossein Ragheb,et al.  MuHAVi: A Multicamera Human Action Video Dataset for the Evaluation of Action Recognition Methods , 2010, 2010 7th IEEE International Conference on Advanced Video and Signal Based Surveillance.

[16]  Rémi Ronfard,et al.  Action Recognition from Arbitrary Views using 3D Exemplars , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[17]  Tieniu Tan,et al.  A compact optical flowbased motion representation for real-time action recognition in surveillance scenes , 2009, 2009 16th IEEE International Conference on Image Processing (ICIP).

[18]  Mubarak Shah,et al.  Incremental action recognition using feature-tree , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[19]  Chen Wu,et al.  Multiview activity recognition in smart homes with spatio-temporal features , 2010, ICDSC '10.

[20]  Leonardo Onofri,et al.  Combining video subsequences for human action recognition , 2012, Proceedings of the 21st International Conference on Pattern Recognition (ICPR2012).

[21]  Massimo Piccardi,et al.  Training Initialization of Hidden Markov Models in Human Action Recognition , 2014, IEEE Transactions on Automation Science and Engineering.

[22]  Barbara Caputo,et al.  Recognizing human actions: a local SVM approach , 2004, Proceedings of the 17th International Conference on Pattern Recognition, 2004. ICPR 2004..

[23]  Luc Van Gool,et al.  Speeded-Up Robust Features (SURF) , 2008, Comput. Vis. Image Underst..

[24]  Ming Yang,et al.  3D Convolutional Neural Networks for Human Action Recognition , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[25]  Andru Putra Twinanda,et al.  Data-driven spatio-temporal RGBD feature encoding for action recognition in operating rooms , 2015, International Journal of Computer Assisted Radiology and Surgery.

[26]  Terrence J. Sejnowski,et al.  Slow Feature Analysis: Unsupervised Learning of Invariances , 2002, Neural Computation.

[27]  Alexandros André Chaaraoui,et al.  Silhouette-based human action recognition using sequences of key poses , 2013, Pattern Recognit. Lett..

[28]  Patrick Pérez,et al.  View-Independent Action Recognition from Temporal Self-Similarities , 2011, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[29]  Paul R. Cohen,et al.  Action Recognition in the Frequency Domain , 2014, ArXiv.

[30]  Nicu Sebe,et al.  Multi-task linear discriminant analysis for multi-view action recognition , 2013, 2013 IEEE International Conference on Image Processing.

[31]  Markus Lappe,et al.  Action Recognition by Motion Detection in Posture Space , 2014, The Journal of Neuroscience.

[32]  Yunde Jia,et al.  View-Invariant Action Recognition Using Latent Kernelized Structural SVM , 2012, ECCV.

[33]  Christian Bauckhage,et al.  Action recognition by learning discriminative key poses , 2011, 2011 IEEE International Conference on Computer Vision Workshops (ICCV Workshops).

[34]  Hélio Pedrini,et al.  Motion Silhouette-Based Real Time Action Recognition , 2013, CIARP.

[35]  Rama Chellappa,et al.  Machine Recognition of Human Activities: A Survey , 2008, IEEE Transactions on Circuits and Systems for Video Technology.

[36]  Jean-Michel Jolion,et al.  Pairwise Features for Human Action Recognition , 2010, 2010 20th International Conference on Pattern Recognition.

[37]  Alexander G. Hauptmann,et al.  MoSIFT: Recognizing Human Actions in Surveillance Videos , 2009 .

[38]  Massimiliano Pontil,et al.  Multi-Task Feature Learning , 2006, NIPS.

[39]  Ke Lu,et al.  Multiview Hessian regularized logistic regression for action recognition , 2015, Signal Process..

[40]  David G. Lowe,et al.  Object recognition from local scale-invariant features , 1999, Proceedings of the Seventh IEEE International Conference on Computer Vision.

[41]  Chaur-Heh Hsieh,et al.  The Recognition of Human Action Using Silhouette Histogram , 2011, ACSC.

[42]  Cordelia Schmid,et al.  Learning realistic human actions from movies , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[43]  Shuicheng Yan,et al.  STAP: Spatial-Temporal Attention-Aware Pooling for Action Recognition , 2015, IEEE Transactions on Circuits and Systems for Video Technology.

[44]  Yu-Chiang Frank Wang,et al.  Recognizing Actions across Cameras by Exploring the Correlated Subspace , 2012, ECCV Workshops.

[45]  Rémi Ronfard,et al.  Free viewpoint action recognition using motion history volumes , 2006, Comput. Vis. Image Underst..

[46]  Fanglin Chen,et al.  Action recognition by hidden temporal models , 2013, The Visual Computer.

[47]  Meng Chen,et al.  Action recognition using lie algebrized gaussians over dense local spatio-temporal features , 2015, Multimedia Tools and Applications.

[48]  Ivan Laptev,et al.  On Space-Time Interest Points , 2005, International Journal of Computer Vision.

[49]  Ruonan Li,et al.  Discriminative virtual views for cross-view action recognition , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[50]  Xinghua Sun,et al.  Action recognition via local descriptors and holistic features , 2009, 2009 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops.

[51]  B. S. Manjunath,et al.  Probabilistic subspace-based learning of shape dynamics modes for multi-view action recognition , 2011, 2011 IEEE International Conference on Computer Vision Workshops (ICCV Workshops).

[52]  Andrew Zisserman,et al.  Improving Human Action Recognition Using Score Distribution and Ranking , 2014, ACCV.

[53]  Alessandro Giusti,et al.  Robust classification of multivariate time series by imprecise hidden Markov models , 2015, Int. J. Approx. Reason..

[54]  Jake K. Aggarwal,et al.  Spatio-temporal relationship match: Video structure comparison for recognition of complex human activities , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[55]  Imran N. Junejo,et al.  Using SAX representation for human action recognition , 2012, J. Vis. Commun. Image Represent..

[56]  Xiaodong Yang,et al.  Effective 3D action recognition using EigenJoints , 2014, J. Vis. Commun. Image Represent..

[57]  Dacheng Tao,et al.  Slow Feature Analysis for Human Action Recognition , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[58]  Hélio Pedrini,et al.  Real-time action recognition based on cumulative Motion shapes , 2014, 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[59]  Silvio Savarese,et al.  Cross-view action recognition via view knowledge transfer , 2011, CVPR 2011.

[60]  Binlong Li,et al.  Cross-view activity recognition using Hankelets , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.