Human action recognition using motion energy template

Abstract. Human action recognition is an active and interesting research topic in computer vision and pattern recognition field that is widely used in the real world. We proposed an approach for human activity analysis based on motion energy template (MET), a new high-level representation of video. The main idea for the MET model is that human actions could be expressed as the composition of motion energy acquired in a three-dimensional (3-D) space-time volume by using a filter bank. The motion energies were directly computed from raw video sequences, thus some problems, such as object location and segmentation, etc., are definitely avoided. Another important competitive merit of this MET method is its insensitivity to gender, hair, and clothing. We extract MET features by using the Bhattacharyya coefficient to measure the motion energy similarity between the action template video and the tested video, and then the 3-D max-pooling. Using these features as input to the support vector machine, extensive experiments on two benchmark datasets, Weizmann and KTH, were carried out. Compared with other state-of-the-art approaches, such as variation energy image, dynamic templates and local motion pattern descriptors, the experimental results demonstrate that our MET model is competitive and promising.

[1]  Juan Carlos Niebles,et al.  Unsupervised Learning of Human Action Categories Using Spatial-Temporal Words , 2006, BMVC.

[2]  Joachim M. Buhmann,et al.  Empirical evaluation of dissimilarity measures for color and texture , 1999, Proceedings of the Seventh IEEE International Conference on Computer Vision.

[3]  Jian Sun,et al.  Spatial Pyramid Pooling in Deep Convolutional Networks for Visual Recognition , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[4]  Matti Pietikäinen,et al.  Recognition of human actions using texture descriptors , 2011, Machine Vision and Applications.

[5]  Dong Xu,et al.  Action recognition using context and appearance distribution features , 2011, CVPR 2011.

[6]  Edward H. Adelson,et al.  The Design and Use of Steerable Filters , 1991, IEEE Trans. Pattern Anal. Mach. Intell..

[7]  Richard P. Wildes,et al.  Spacetime Stereo and 3D Flow via Binocular Spatiotemporal Orientation Analysis , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[8]  R. Venkatesh Babu,et al.  Recognition of human actions using motion history information extracted from the compressed video , 2004, Image Vis. Comput..

[9]  Eli Shechtman,et al.  Space-Time Behavior-Based Correlation-OR-How to Tell If Two Underlying Motion Fields Are Similar Without Computing Them? , 2007, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[10]  Adrian Hilton,et al.  A survey of advances in vision-based human motion capture and analysis , 2006, Comput. Vis. Image Underst..

[11]  Konstantinos G. Derpanis,et al.  Three-dimensional nth derivative of Gaussian separable steerable filters , 2005, IEEE International Conference on Image Processing 2005.

[12]  Larry S. Davis,et al.  W4: Real-Time Surveillance of People and Their Activities , 2000, IEEE Trans. Pattern Anal. Mach. Intell..

[13]  Tieniu Tan,et al.  A Discriminative Model of Motion and Cross Ratio for View-Invariant Action Recognition , 2012, IEEE Transactions on Image Processing.

[14]  Chung-Lin Huang,et al.  Motion estimation method using a 3D steerable filter , 1995, Image Vis. Comput..

[15]  Youssef Chahir,et al.  Unified framework for human behaviour recognition: An approach using 3D Zernike moments , 2013, Neurocomputing.

[16]  Christopher J. C. Burges,et al.  Dimension Reduction: A Guided Tour , 2010, Found. Trends Mach. Learn..

[17]  Barbara Caputo,et al.  Recognizing human actions: a local SVM approach , 2004, Proceedings of the 17th International Conference on Pattern Recognition, 2004. ICPR 2004..

[18]  Lijuan Cao,et al.  A comparison of PCA, KPCA and ICA for dimensionality reduction in support vector machine , 2003, Neurocomputing.

[19]  Jitendra Malik,et al.  Recognizing action at a distance , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[20]  Jianfang Dou,et al.  Robust human action recognition based on spatio-temporal descriptors and motion temporal templates , 2014 .

[21]  Tanaya Guha,et al.  Learning Sparse Representations for Human Action Recognition , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[22]  Trevor Darrell,et al.  The Pyramid Match Kernel: Efficient Learning with Sets of Features , 2007, J. Mach. Learn. Res..

[23]  G. Baudat,et al.  Generalized Discriminant Analysis Using a Kernel Approach , 2000, Neural Computation.

[24]  Liang Wang,et al.  Human action recognition by feature-reduced Gaussian process classification , 2009, Pattern Recognit. Lett..

[25]  Heung-Il Suk,et al.  Hand gesture recognition based on dynamic Bayesian network framework , 2010, Pattern Recognit..

[26]  Peyman Milanfar,et al.  Action Recognition from One Example , 2011, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[27]  Cordelia Schmid,et al.  Human Detection Using Oriented Histograms of Flow and Appearance , 2006, ECCV.

[28]  Md. Atiqur Rahman Ahad,et al.  Motion history image: its variants and applications , 2012, Machine Vision and Applications.

[29]  Ronen Basri,et al.  Actions as space-time shapes , 2005, Tenth IEEE International Conference on Computer Vision (ICCV'05) Volume 1.

[30]  E H Adelson,et al.  Spatiotemporal energy models for the perception of motion. , 1985, Journal of the Optical Society of America. A, Optics and image science.

[31]  Richard P. Wildes,et al.  Dynamic scene understanding: The role of orientation features in space and time in scene classification , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[32]  Richard P. Wildes,et al.  Action Spotting and Recognition Based on a Spatiotemporal Orientation Analysis , 2013, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[33]  Yogesh Rathi,et al.  Image Segmentation Using Active Contours Driven by the Bhattacharyya Gradient Flow , 2007, IEEE Transactions on Image Processing.

[34]  Adriana Kovashka,et al.  Learning a hierarchy of discriminative space-time neighborhood features for human action recognition , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[35]  Chih-Jen Lin,et al.  LIBSVM: A library for support vector machines , 2011, TIST.

[36]  M. Archana,et al.  Human Behavior Classification Using Multi-Class Relevance Vector Machine , 2010 .

[37]  Gregory D. Hager,et al.  Dynamic Template Tracking and Recognition , 2013, International Journal of Computer Vision.

[38]  Rémi Ronfard,et al.  A survey of vision-based methods for action representation, segmentation and recognition , 2011, Comput. Vis. Image Underst..

[39]  Yongcai Guo,et al.  Recognition of human activities using a multiclass relevance vector machine , 2012 .

[40]  Hao Su,et al.  Object Bank: An Object-Level Image Representation for High-Level Visual Recognition , 2014, International Journal of Computer Vision.

[41]  R. Vidal,et al.  Histograms of oriented optical flow and Binet-Cauchy kernels on nonlinear dynamical systems for the recognition of human actions , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[42]  Ronald Poppe,et al.  A survey on vision-based human action recognition , 2010, Image Vis. Comput..

[43]  J.K. Aggarwal,et al.  Human activity analysis , 2011, ACM Comput. Surv..

[44]  Ming Liu,et al.  Hierarchical Space-Time Model Enabling Efficient Search for Human Actions , 2009, IEEE Transactions on Circuits and Systems for Video Technology.

[45]  James W. Davis,et al.  The Recognition of Human Movement Using Temporal Templates , 2001, IEEE Trans. Pattern Anal. Mach. Intell..