Searching Human Behaviors using Spatial-Temporalwords

This paper proposes an approach to searching human behaviors in videos using spatial-temporal words which are learnt from unlabelled data with various human behaviors through unsupervised learning. Both the query and the searched videos are represented by codewords frequencies, which capture the intrinsic information of motion and appearance of human behaviors. This representation further enables us to make use of integral histograms to accelerate the searching procedure. The performance also benefits from our feature representation that, through a MAX-like operation, may simulate the cortical equivalent of the machine-vision "window of analysis"(M. Riesenhuber and T. Poggio, 1999). Examples of challenging sequences with complex behaviors, including tennis and ballet, are shown.

[1]  Thomas Serre,et al.  Object recognition with features inspired by visual cortex , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[2]  Alexei A. Efros,et al.  Discovering object categories in image collections , 2005 .

[3]  Michael J. Black Explaining optical flow events with parameterized spatio-temporal models , 1999, Proceedings. 1999 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (Cat. No PR00149).

[4]  Michal Irani,et al.  Detecting Irregularities in Images and in Video , 2005, Tenth IEEE International Conference on Computer Vision (ICCV'05) Volume 1.

[5]  Lihi Zelnik-Manor,et al.  Event-based analysis of video , 2001, Proceedings of the 2001 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. CVPR 2001.

[6]  N. Goodwin,et al.  Learning to Detect Objects in Images via a Sparse, Part-Based Representation , 2004 .

[7]  Eli Shechtman,et al.  Space-time behavior based correlation , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[8]  Pietro Perona,et al.  A Bayesian hierarchical model for learning natural scene categories , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[9]  Jitendra Malik,et al.  Recognizing action at a distance , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[10]  T. Poggio,et al.  Hierarchical models of object recognition in cortex , 1999, Nature Neuroscience.

[11]  Solomon Kullback,et al.  Information Theory and Statistics , 1970, The Mathematical Gazette.

[12]  D H HUBEL,et al.  RECEPTIVE FIELDS AND FUNCTIONAL ARCHITECTURE IN TWO NONSTRIATE VISUAL AREAS (18 AND 19) OF THE CAT. , 1965, Journal of neurophysiology.

[13]  Thomas Hofmann,et al.  Probabilistic latent semantic indexing , 1999, SIGIR '99.

[14]  Dan Roth,et al.  Learning to detect objects in images via a sparse, part-based representation , 2004, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[15]  Eric R. Ziegel,et al.  The Elements of Statistical Learning , 2003, Technometrics.

[16]  Ivan Laptev,et al.  On Space-Time Interest Points , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[17]  Fatih Murat Porikli,et al.  Integral histogram: a fast way to extract histograms in Cartesian spaces , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[18]  James W. Davis,et al.  The Recognition of Human Movement Using Temporal Templates , 2001, IEEE Trans. Pattern Anal. Mach. Intell..