Spatio-temporal motion field descriptors for the hierarchical action recognition system

Spatio-temporal motion field descriptors for the hierarchical action recognition system have been developed. At the lower level, blurred motion fields representing local features such as speed and direction are estimated from video sequences. At the higher level, feature vectors are calculated by applying a parallel updating scheme to measure the similarity between the input video sequences and so-called template patches, which are selected from learning samples. Instead of traditional patch descriptors that applied computationally expensive methods such as PCA to reduce the size of feature vectors, we extended previous hardware-based ideas of Averaged Principal-Edge Distribution (APED) in face detection by incorporating temporal information. In addition, since the system only employs simple computations such as summation and Boolean operators, it can be implemented on VLSI chips without much effort to achieve real-time performance. We tested our system on a popular action recognition database and promising recognition performance has been reported.

[1]  Serge J. Belongie,et al.  Behavior recognition via sparse spatio-temporal features , 2005, 2005 IEEE International Workshop on Visual Surveillance and Performance Evaluation of Tracking and Surveillance.

[2]  Tadashi Shibata,et al.  Block-matching-based motion field generation utilizing directional edge displacement , 2010, Comput. Electr. Eng..

[3]  David G. Lowe,et al.  University of British Columbia. , 1945, Canadian Medical Association journal.

[4]  Ivan Laptev,et al.  On Space-Time Interest Points , 2005, International Journal of Computer Vision.

[5]  Jitendra Malik,et al.  Recognizing action at a distance , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[6]  T. Shibata,et al.  A Binary-Tree Hierarchical Multiple-Chip Architecture for Real-Time Large-Scale Learning Processor Systems , 2009 .

[7]  Cordelia Schmid,et al.  Evaluation of Local Spatio-temporal Features for Action Recognition , 2009, BMVC.

[8]  Cordelia Schmid,et al.  A Spatio-Temporal Descriptor Based on 3D-Gradients , 2008, BMVC.

[9]  Cordelia Schmid,et al.  Learning realistic human actions from movies , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[10]  C. Koch,et al.  Invariant visual representation by single neurons in the human brain , 2005, Nature.

[11]  Tadashi Shibata,et al.  A gesture perception algorithm using compact one-dimensional representation of spatio-temporal motion-field patches , 2009, 2009 3rd International Conference on Signal Processing and Communication Systems.

[12]  Thomas Serre,et al.  A Biologically Inspired System for Action Recognition , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[13]  Tadashi Shibata Bio-inspired devices, circuits and systems , 2009, 2009 Proceedings of the European Solid State Device Research Conference.

[14]  Tadashi Shibata,et al.  Multi-view face detection and pose estimation employing edge-based feature vectors , 2006, 2006 14th European Signal Processing Conference.

[15]  Juan Carlos Niebles,et al.  A Hierarchical Model of Shape and Appearance for Human Action Classification , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[16]  Ronen Basri,et al.  Actions as Space-Time Shapes , 2007, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[17]  Luc Van Gool,et al.  An Efficient Dense and Scale-Invariant Spatio-Temporal Interest Point Detector , 2008, ECCV.