Video Scene Understanding Using Multi-scale Analysis

We propose a novel method for automatically discovering key motion patterns happening in a scene by observing the scene for an extended period. Our method does not rely on object detection and tracking, and uses low level features, the direction of pixel wise optical flow. We first divide the video into clips and estimate a sequence of flow-fields. Each moving pixel is quantized based on its location and motion direction. This is essentially a bag of words representation of clips. Once a bag of words representation is obtained, we proceed to the screening stage, using a measure called the ‘conditional entropy’. After obtaining useful words we apply Diffusion maps. Diffusion maps framework embeds the manifold points into a lower dimensional space while preserving the intrinsic local geometric structure. Finally, these useful words in lower dimensional space are clustered to discover key motion patterns. Diffusion map embedding involves diffusion time parameter which gives us ability to detect key motion patterns at different scales using multi-scale analysis. In addition, clips which are represented in terms of frequency of motion patterns can also be clustered to determine multiple dominant motion patterns which occur simultaneously, providing us further understanding of the scene. We have tested our approach on two challenging datasets and obtained interesting and promising results.

[1]  W. Eric L. Grimson,et al.  Learning Semantic Scene Models by Trajectory Analysis , 2006, ECCV.

[2]  Mubarak Shah,et al.  Learning object motion patterns for anomaly detection and improved object detection , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[3]  Mubarak Shah,et al.  Automated Visual Surveillance in Realistic Scenarios , 2007, IEEE MultiMedia.

[4]  Ann B. Lee,et al.  Diffusion maps and coarse-graining: a unified framework for dimensionality reduction, graph partitioning, and data set parameterization , 2006, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[5]  Mubarak Shah,et al.  Probabilistic Modeling of Scene Dynamics for Applications in Visual Surveillance , 2009, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[6]  Shaogang Gong,et al.  Beyond Tracking: Modelling Activity and Understanding Behaviour , 2006, International Journal of Computer Vision.

[7]  Mubarak Shah,et al.  COCOA: tracking in aerial imagery , 2006, SPIE Defense + Commercial Sensing.

[8]  Graham Coleman,et al.  Detection and explanation of anomalous activities: representing activities as bags of event n-grams , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[9]  Stéphane Lafon,et al.  Diffusion maps , 2006 .

[10]  Tim J. Ellis,et al.  Learning semantic scene models from observing activity in visual surveillance , 2005, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).

[11]  W. Eric L. Grimson,et al.  Unsupervised Activity Perception by Hierarchical Bayesian Models , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[12]  Tieniu Tan,et al.  A system for learning statistical motion patterns , 2006, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[13]  Mubarak Shah,et al.  Learning semantic visual vocabularies using diffusion distance , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[14]  B. Nadler,et al.  Diffusion maps, spectral clustering and reaction coordinates of dynamical systems , 2005, math/0503445.

[15]  W. Eric L. Grimson,et al.  Learning Patterns of Activity Using Real-Time Tracking , 2000, IEEE Trans. Pattern Anal. Mach. Intell..