Human action recognition using extreme learning machine based on visual vocabularies

This paper introduces a novel recognition framework for human actions using hybrid features. The hybrid features consist of spatio-temporal and local static features extracted using motion-selectivity attribute of 3D dual-tree complex wavelet transform (3D DT-CWT) and affine SIFT local image detector, respectively. The proposed model offers two core advantages: (1) the framework is significantly faster than traditional approaches due to volumetric processing of images as a '3D box of data' instead of a frame by frame analysis, (2) rich representation of human actions in terms of reduction in artifacts in view of the promising properties of our recently designed full symmetry complex filter banks with better directionality and shift-invariance properties. No assumptions about scene background, location, objects of interest, or point of view information are made whereas bidirectional two-dimensional PCA (2D-PCA) is employed for dimensionality reduction which offers enhanced capabilities to preserve structure and correlation amongst neighborhood pixels of a video frame.

[1]  Richard Baraniuk,et al.  The dual-tree complex wavelet transform , 2005, IEEE Signal Processing Magazine.

[2]  Jitendra Malik,et al.  Recognizing action at a distance , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[3]  Thomas Serre,et al.  A Biologically Inspired System for Action Recognition , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[4]  Xuelong Li,et al.  Scene segmentation based on IPCA for visual surveillance , 2009, Neurocomputing.

[5]  Alejandro F. Frangi,et al.  Two-dimensional PCA: a new approach to appearance-based face representation and recognition , 2004 .

[6]  Jiebo Luo,et al.  Recognizing realistic actions from videos “in the wild” , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[7]  Greg Mori,et al.  Max-margin hidden conditional random fields for human action recognition , 2009, CVPR.

[8]  Michael J. Black Explaining optical flow events with parametrized spatio-temporal tracking , 1999, CVPR 1999.

[9]  Chee Kheong Siew,et al.  Extreme learning machine: Theory and applications , 2006, Neurocomputing.

[10]  Alex Pentland,et al.  Coupled hidden Markov models for complex action recognition , 1997, Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[11]  Ivan Laptev,et al.  On Space-Time Interest Points , 2005, International Journal of Computer Vision.

[12]  Luc Van Gool,et al.  Action snippets: How many frames does human action recognition require? , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[13]  Juan Carlos Niebles,et al.  A Hierarchical Model of Shape and Appearance for Human Action Classification , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[14]  Fei Shi,et al.  Video denoising using oriented complex wavelet transforms , 2004, 2004 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[15]  Cordelia Schmid,et al.  Learning realistic human actions from movies , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[16]  Ronen Basri,et al.  Actions as Space-Time Shapes , 2007, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[17]  Matthijs C. Dorst Distinctive Image Features from Scale-Invariant Keypoints , 2011 .

[18]  Ze-Nian Li,et al.  Successive Convex Matching for Action Detection , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[19]  Aryaz Baradarani,et al.  Sampled-Data Design of FIR Dual Filter Banks for Dual-Tree Complex Wavelet Transforms via LMI Optimization , 2008, IEEE Transactions on Signal Processing.

[20]  Ivan W. Selesnick,et al.  Video denoising using 2D and 3D dual-tree complex wavelet transforms , 2003, SPIE Optics + Photonics.

[21]  Thomas J. Burns A Non-Homogeneous, Spatio-Temporal, Wavelet Multiresolution Analysis and Its Application to the Analysis of Motion , 1993 .

[22]  N. Kingsbury Complex Wavelets for Shift Invariant Analysis and Filtering of Signals , 2001 .

[23]  Mubarak Shah,et al.  Recognizing human actions using multiple features , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[24]  David Elliott,et al.  In the Wild , 2010 .

[25]  Jitendra Malik,et al.  Recovering human body configurations: combining segmentation and recognition , 2004, Proceedings of the 2004 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2004. CVPR 2004..

[26]  G LoweDavid,et al.  Distinctive Image Features from Scale-Invariant Keypoints , 2004 .

[27]  Yang Wang,et al.  Learning a discriminative hidden part model for human action recognition , 2008, NIPS.

[28]  Mubarak Shah,et al.  Chaotic Invariants for Human Action Recognition , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[29]  Richard Baraniuk,et al.  The Dual-tree Complex Wavelet Transform , 2007 .

[30]  Mubarak Shah,et al.  Actions sketch: a novel action representation , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[31]  Juan Carlos Niebles,et al.  Unsupervised Learning of Human Action Categories Using Spatial-Temporal Words , 2006, BMVC.

[32]  Liang Wang,et al.  Recognizing Human Activities from Silhouettes: Motion Subspace and Factorial Discriminative Graphical Model , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[33]  Jean-Michel Morel,et al.  ASIFT: A New Framework for Fully Affine Invariant Image Comparison , 2009, SIAM J. Imaging Sci..

[34]  Serge J. Belongie,et al.  Behavior recognition via sparse spatio-temporal features , 2005, 2005 IEEE International Workshop on Visual Surveillance and Performance Evaluation of Tracking and Surveillance.

[35]  Xuelong Li,et al.  Human Gait Recognition With Matrix Representation , 2006, IEEE Transactions on Circuits and Systems for Video Technology.

[36]  Xuelong Li,et al.  Binary Two-Dimensional PCA , 2008, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).

[37]  Ying Wu,et al.  Discriminative subvolume search for efficient action detection , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[38]  Mubarak Shah,et al.  Learning human actions via information maximization , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[39]  Truong Q. Nguyen,et al.  Wavelets and filter banks , 1996 .

[40]  Barbara Caputo,et al.  Recognizing human actions: a local SVM approach , 2004, Proceedings of the 17th International Conference on Pattern Recognition, 2004. ICPR 2004..