Learning Semantic Motion Patterns for Dynamic Scenes by Improved Sparse Topical Coding

With the proliferation of cameras in public areas, it becomes increasingly desirable to develop fully automated surveillance and monitoring systems. In this paper, we propose a novel unsupervised approach to automatically explore motion patterns occurring in dynamic scenes under an improved sparse topical coding (STC) framework. Given an input video with a fixed camera, we first segment the whole video into a sequence of clips (documents) without overlapping. Optical flow features are extracted from each pair of consecutive frames, and quantized into discrete visual words. Then the video is represented by a word-document hierarchical topic model through a generative process. Finally, an improved sparse topical coding approach is proposed for model learning. The semantic motion patterns (latent topics) are learned automatically and each video clip is represented as a weighted summation of these patterns with only a few nonzero coefficients. The proposed approach is purely data-driven and scene independent (not an object-class specific), which make it suitable for very large range of scenarios. Experiments demonstrate that our approach outperforms the state-of-the art technologies in dynamic scene analysis.

[1]  Eric P. Xing,et al.  Sparse Topical Coding , 2011, UAI.

[2]  Chong Wang,et al.  Simultaneous image classification and annotation , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[3]  Shaogang Gong,et al.  A Markov Clustering Topic Model for mining behaviour in video , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[4]  Leonidas J. Guibas,et al.  Human action recognition by learning bases of action attributes and parts , 2011, 2011 International Conference on Computer Vision.

[5]  Junsong Yuan,et al.  Sparse reconstruction cost for abnormal event detection , 2011, CVPR 2011.

[6]  Horst Bischof,et al.  A Duality Based Approach for Realtime TV-L1 Optical Flow , 2007, DAGM-Symposium.

[7]  W. Eric L. Grimson,et al.  Unsupervised Activity Perception by Hierarchical Bayesian Models , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[8]  Luc Van Gool,et al.  What's going on? Discovering spatio-temporal dependencies in dynamic scenes , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[9]  Tianzhu Zhang,et al.  Learning semantic scene models by object classification and trajectory clustering , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[10]  Yihong Gong,et al.  Linear spatial pyramid matching using sparse coding for image classification , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[11]  Mubarak Shah,et al.  Scene understanding by statistical modeling of motion patterns , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[12]  Mubarak Shah,et al.  Video Scene Understanding Using Multi-scale Analysis , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[13]  Wotao Yin,et al.  Bregman Iterative Algorithms for (cid:2) 1 -Minimization with Applications to Compressed Sensing ∗ , 2008 .

[14]  Shaogang Gong,et al.  Scene Segmentation for Behaviour Correlation , 2008, ECCV.