Hierarchical Space-Time Model Enabling Efficient Search for Human Actions

We propose a five-layer hierarchical space-time model (HSTM) for representing and searching human actions in videos. From a features point of view, both invariance and selectivity are desirable characteristics, which seem to contradict each other. To make these characteristics coexist, we introduce a coarse-to-fine search and verification scheme for action searching, based on the HSTM model. Because going through layers of the hierarchy corresponds to progressively turning the knob between invariance and selectivity, this strategy enables search for human actions ranging from rapid movements of sports to subtle motions of facial expressions. The introduction of the Histogram of Gabor Orientations feature makes the searching for actions go smoothly across the hierarchical layers of the HSTM model. The efficient matching is achieved by applying integral histograms to compute the features in the top two layers. The HSTM model was tested on three selected challenging video sequences and on the KTH human action database. And it achieved improvement over other state-of-the-art algorithms. These promising results validate that the HSTM model is both selective and robust for searching human actions.

[1]  D. H. Hubel,et al.  RECEPTIVE FIELDS, BINOCULAR AND FUNCTIONAL ARCHITECTURE IN THE CAT’S VISUAL CORTEX , 1962 .

[2]  D. Hubel,et al.  Receptive fields, binocular interaction and functional architecture in the cat's visual cortex , 1962, The Journal of physiology.

[3]  D H HUBEL,et al.  RECEPTIVE FIELDS AND FUNCTIONAL ARCHITECTURE IN TWO NONSTRIATE VISUAL AREAS (18 AND 19) OF THE CAT. , 1965, Journal of neurophysiology.

[4]  Solomon Kullback,et al.  Information Theory and Statistics , 1970, The Mathematical Gazette.

[5]  Christopher G. Harris,et al.  A Combined Corner and Edge Detector , 1988, Alvey Vision Conference.

[6]  John K. Tsotsos,et al.  Modeling Visual Attention via Selective Tuning , 1995, Artif. Intell..

[7]  Yoshua Bengio,et al.  Gradient-based learning applied to document recognition , 1998, Proc. IEEE.

[8]  James L. Crowley,et al.  Probabilistic recognition of activity using local appearance , 1999, Proceedings. 1999 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (Cat. No PR00149).

[9]  T. Poggio,et al.  Hierarchical models of object recognition in cortex , 1999, Nature Neuroscience.

[10]  James L. Crowley,et al.  A Probabilistic Sensor for the Perception and Recognition of Activities , 2000, ECCV.

[11]  M. Irani,et al.  Event-Based Video Analysis, , 2001 .

[12]  Lihi Zelnik-Manor,et al.  Event-based analysis of video , 2001, Proceedings of the 2001 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. CVPR 2001.

[13]  Rama Chellappa,et al.  Activity recognition using the dynamics of the configuration of interacting objects , 2003, 2003 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2003. Proceedings..

[14]  Jitendra Malik,et al.  Recognizing action at a distance , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[15]  David A. Forsyth,et al.  Automatic Annotation of Everyday Movements , 2003, NIPS.

[16]  P. Lennie Receptive fields , 2003, Current Biology.

[17]  T. Poggio,et al.  Cognitive neuroscience: Neural mechanisms for the recognition of biological movements , 2003, Nature Reviews Neuroscience.

[18]  Ivan Laptev,et al.  On Space-Time Interest Points , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[19]  G LoweDavid,et al.  Distinctive Image Features from Scale-Invariant Keypoints , 2004 .

[20]  Kunihiko Fukushima,et al.  Neocognitron: A self-organizing neural network model for a mechanism of pattern recognition unaffected by shift in position , 1980, Biological Cybernetics.

[21]  Ankur Agarwal,et al.  Learning to track 3D human motion from silhouettes , 2004, ICML.

[22]  B. Caputo,et al.  Recognizing human actions: a local SVM approach , 2004, Proceedings of the 17th International Conference on Pattern Recognition, 2004. ICPR 2004..

[23]  Bernt Schiele,et al.  Recognition without Correspondence using Multidimensional Receptive Field Histograms , 2004, International Journal of Computer Vision.

[24]  Thomas Serre,et al.  Object recognition with features inspired by visual cortex , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[25]  Ronen Basri,et al.  Actions as space-time shapes , 2005, Tenth IEEE International Conference on Computer Vision (ICCV'05) Volume 1.

[26]  Martial Hebert,et al.  Efficient visual event detection using volumetric features , 2005, Tenth IEEE International Conference on Computer Vision (ICCV'05) Volume 1.

[27]  Eli Shechtman,et al.  Space-time behavior based correlation , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[28]  Michal Irani,et al.  Detecting Irregularities in Images and in Video , 2005, Tenth IEEE International Conference on Computer Vision (ICCV'05) Volume 1.

[29]  Serge J. Belongie,et al.  Behavior recognition via sparse spatio-temporal features , 2005, 2005 IEEE International Workshop on Visual Surveillance and Performance Evaluation of Tracking and Surveillance.

[30]  Mubarak Shah,et al.  Recognizing human actions in videos acquired by uncalibrated moving cameras , 2005, Tenth IEEE International Conference on Computer Vision (ICCV'05) Volume 1.

[31]  Bill Triggs,et al.  Histograms of oriented gradients for human detection , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[32]  Fatih Murat Porikli,et al.  Integral histogram: a fast way to extract histograms in Cartesian spaces , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[33]  Juan Carlos Niebles,et al.  Unsupervised Learning of Human Action Categories Using Spatial-Temporal Words , 2006, BMVC.

[34]  David G. Lowe,et al.  Multiclass Object Recognition with Sparse, Localized Features , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[35]  Thomas Serre,et al.  A Biologically Inspired System for Action Recognition , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[36]  Rama Chellappa,et al.  Epitomic Representation of Human Activities , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[37]  Thomas Serre,et al.  Robust Object Recognition with Cortex-Like Mechanisms , 2007, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[38]  Juan Carlos Niebles,et al.  A Hierarchical Model of Shape and Appearance for Human Action Classification , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[39]  Tae-Kyun Kim,et al.  Learning Motion Categories using both Semantic and Structural Information , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[40]  Tae-Kyun Kim,et al.  Tensor Canonical Correlation Analysis for Action Classification , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[41]  Matthijs C. Dorst Distinctive Image Features from Scale-Invariant Keypoints , 2011 .

[42]  Evgueni A. Haroutunian,et al.  Information Theory and Statistics , 2011, International Encyclopedia of Statistical Science.