Space-Time Behavior-Based Correlation-OR-How to Tell If Two Underlying Motion Fields Are Similar Without Computing Them?

We introduce a behavior-based similarity measure that tells us whether two different space-time intensity patterns of two different video segments could have resulted from a similar underlying motion field. This is done directly from the intensity information, without explicitly computing the underlying motions. Such a measure allows us to detect similarity between video segments of differently dressed people performing the same type of activity. It requires no foreground/background segmentation, no prior learning of activities, and no motion estimation or tracking. Using this behavior-based similarity measure, we extend the notion of two-dimensional image correlation into the three-dimensional space-time volume and thus allowing to correlate dynamic behaviors and actions. Small space-time video segments (small video clips) are "correlated" against the entire video sequences in all three dimensions (x, y, and t). Peak correlation values correspond to video locations with similar dynamic behaviors. Our approach can detect very complex behaviors in video sequences (for example, ballet movements, pool dives, and running water), even when multiple complex activities occur simultaneously within the field of view of the camera. We further show its robustness to small changes in scale and orientation of the correlated behavior.

[1]  Takeo Kanade,et al.  An Iterative Image Registration Technique with an Application to Stereo Vision , 1981, IJCAI.

[2]  Christopher G. Harris,et al.  A Combined Corner and Edge Detector , 1988, Alvey Vision Conference.

[3]  Gösta H. Granlund,et al.  Optical Flow Based on the Inertia Matrix of the Frequency Domain , 1988 .

[4]  Edward H. Adelson,et al.  Analyzing and recognizing walking figures in XYT , 1994, 1994 Proceedings of IEEE Conference on Computer Vision and Pattern Recognition.

[5]  Christoph Bregler,et al.  Learning and recognizing human dynamics in video sequences , 1997, Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[6]  Michael J. Black Explaining optical flow events with parameterized spatio-temporal models , 1999, Proceedings. 1999 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (Cat. No PR00149).

[7]  B. Jähne,et al.  Handbook of Computer Vision and Applications: Volume 1: From Scenes to Images , 1999 .

[8]  Michael J. Black,et al.  Parameterized Modeling and Recognition of Activities , 1999, Comput. Vis. Image Underst..

[9]  James L. Crowley,et al.  A Probabilistic Sensor for the Perception and Recognition of Activities , 2000, ECCV.

[10]  James L. Crowley,et al.  A probabilistic sensor for the perception of activities , 2000, Proceedings Fourth IEEE International Conference on Automatic Face and Gesture Recognition (Cat. No. PR00580).

[11]  Yaron Caspi,et al.  A step towards sequence-to-sequence alignment , 2000, Proceedings IEEE Conference on Computer Vision and Pattern Recognition. CVPR 2000 (Cat. No.PR00662).

[12]  Kevin Barraclough,et al.  I and i , 2001, BMJ : British Medical Journal.

[13]  Hanno Scharr,et al.  Accurate Optical Flow in Noisy Image Sequences , 2001, ICCV.

[14]  James W. Davis,et al.  The Recognition of Human Movement Using Temporal Templates , 2001, IEEE Trans. Pattern Anal. Mach. Intell..

[15]  Lihi Zelnik-Manor,et al.  Event-based analysis of video , 2001, Proceedings of the 2001 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. CVPR 2001.

[16]  Stefan Carlsson,et al.  Recognizing and Tracking Human Action , 2002, ECCV.

[17]  Jitendra Malik,et al.  Recognizing action at a distance , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[18]  Ivan Laptev,et al.  On Space-Time Interest Points , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[19]  Mubarak Shah,et al.  View-Invariant Representation and Recognition of Actions , 2002, International Journal of Computer Vision.

[20]  Ivan Laptev,et al.  Galilean-diagonalized spatio-temporal interest operators , 2004, Proceedings of the 17th International Conference on Pattern Recognition, 2004. ICPR 2004..

[21]  Ronen Basri,et al.  Actions as space-time shapes , 2005, Tenth IEEE International Conference on Computer Vision (ICCV'05) Volume 1.

[22]  Martial Hebert,et al.  Efficient visual event detection using volumetric features , 2005, Tenth IEEE International Conference on Computer Vision (ICCV'05) Volume 1.

[23]  Eli Shechtman,et al.  Space-time behavior based correlation , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[24]  Mubarak Shah,et al.  Actions sketch: a novel action representation , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[25]  Serge J. Belongie,et al.  Behavior recognition via sparse spatio-temporal features , 2005, 2005 IEEE International Workshop on Visual Surveillance and Performance Evaluation of Tracking and Surveillance.

[26]  Lihi Zelnik-Manor,et al.  Statistical analysis of dynamic actions , 2006, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[27]  Michal Irani,et al.  Aligning Sequences and Actions by Maximizing Space-Time Correlations , 2006, ECCV.

[28]  Daphna Weinshall,et al.  Motion Segmentation Using an Occlusion Detector , 2006, WDV.

[29]  Juan Carlos Niebles,et al.  Unsupervised Learning of Human Action Categories Using Spatial-Temporal Words , 2006, BMVC.

[30]  Juan Carlos Niebles,et al.  Unsupervised Learning of Human Action Categories Using Spatial-Temporal Words , 2008, International Journal of Computer Vision.