Summarizing high-level scene behavior

We present several novel techniques to summarize the high-level behavior in surveillance video. Our proposed methods can employ either optical flow or trajectories as input, and incorporate spatial and temporal information together, which improve upon existing approaches for summarization. To begin, we extract common pathway regions by performing graph-based clustering on similarity matrices describing the relationships between location/orientation states. We then employ the activities along the pathway regions to extract the aggregate behavioral patterns throughout scenes. We show how our summarization methods can be applied to detect anomalies, retrieve video clips of interest, and generate adaptive-speed summary videos. We examine our approaches on multiple complex urban scenes and present experimental results.

[1]  Denis Simakov,et al.  Summarizing visual data using bidirectional similarity , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[2]  Yael Pritch,et al.  Making a Long Video Short: Dynamic Video Synopsis , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[3]  J. S. Jimmy Li,et al.  Color Filter Array Demosaicking Using High-Order Interpolation Techniques With a Weighted Median Filter for Sharp Color Edge Preservation , 2009, IEEE Transactions on Image Processing.

[4]  Mubarak Shah,et al.  Probabilistic Modeling of Scene Dynamics for Applications in Visual Surveillance , 2009, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[5]  Carlo Tomasi,et al.  Good features to track , 1994, 1994 Proceedings of IEEE Conference on Computer Vision and Pattern Recognition.

[6]  Jianping Fan,et al.  Exploring video content structure for hierarchical summarization , 2004, Multimedia Systems.

[7]  Jitendra Malik,et al.  Normalized cuts and image segmentation , 1997, Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[8]  W. Eric L. Grimson,et al.  Trajectory analysis and semantic region modeling using a nonparametric Bayesian model , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[9]  Serge Miguet,et al.  Common Motion Map Based on Codebooks , 2009, ISVC.

[10]  Trevor Darrell,et al.  The pyramid match kernel: discriminative classification with sets of image features , 2005, Tenth IEEE International Conference on Computer Vision (ICCV'05) Volume 1.

[11]  James W. Davis,et al.  Extracting Pathlets FromWeak Tracking Data , 2010, 2010 7th IEEE International Conference on Advanced Video and Signal Based Surveillance.

[12]  Gunther Heidemann,et al.  Information-based adaptive fast-forward for visual surveillance , 2011, Multimedia Tools and Applications.

[13]  Ulrike von Luxburg,et al.  A tutorial on spectral clustering , 2007, Stat. Comput..

[14]  W. Eric L. Grimson,et al.  Learning Semantic Scene Models by Trajectory Analysis , 2006, ECCV.

[15]  Nebojsa Jojic,et al.  Adaptive Video Fast Forward , 2005, Multimedia Tools and Applications.

[16]  W. Eric L. Grimson,et al.  Learning Patterns of Activity Using Real-Time Tracking , 2000, IEEE Trans. Pattern Anal. Mach. Intell..

[17]  Mikel D. Rodriguez CRAM: Compact representation of actions in movies , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[18]  Yael Pritch,et al.  Clustered Synopsis of Surveillance Video , 2009, 2009 Sixth IEEE International Conference on Advanced Video and Signal Based Surveillance.

[19]  Mubarak Shah,et al.  Video Scene Understanding Using Multi-scale Analysis , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[20]  Yiannis Kompatsiaris,et al.  Introduction to the special issue on image and video retrieval: theory and applications , 2010, Multimedia Tools and Applications.

[21]  Shaogang Gong,et al.  Scene Segmentation for Behaviour Correlation , 2008, ECCV.

[22]  Yael Pritch,et al.  This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2008 1 Non-Chronological Video , 2022 .

[23]  Tim J. Ellis,et al.  Path detection in video surveillance , 2002, Image Vis. Comput..

[24]  Alan Hanjalic,et al.  An integrated scheme for automated video abstraction based on unsupervised cluster-validity analysis , 1999, IEEE Trans. Circuits Syst. Video Technol..

[25]  Cordelia Schmid,et al.  Beyond Bags of Features: Spatial Pyramid Matching for Recognizing Natural Scene Categories , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[26]  Janusz Konrad,et al.  Video Condensation by Ribbon Carving , 2009, IEEE Transactions on Image Processing.

[27]  Brendan J. Frey,et al.  Video Epitomes , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[28]  Yee Leung,et al.  Clustering by Scale-Space Filtering , 2000, IEEE Trans. Pattern Anal. Mach. Intell..

[29]  W. Eric L. Grimson,et al.  Unsupervised Activity Perception in Crowded and Complicated Scenes Using Hierarchical Bayesian Models , 2008, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[30]  Yasuyuki Matsushita,et al.  Space-Time Video Montage , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[31]  Michael Spann,et al.  A new approach to clustering , 1990, Pattern Recognit..

[32]  Shin'ichi Satoh,et al.  Real Time Tunnel Based Video Summarization Using Direct Shift Collision Detection , 2010, PCM.

[33]  Jitendra Malik,et al.  Learning a classification model for segmentation , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[34]  James W. Davis,et al.  Extracting Pathlets From Weak Tracking Data ∗ , 2010 .

[35]  Brendan J. Frey,et al.  Epitomic analysis of appearance and shape , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[36]  Shaogang Gong,et al.  A Markov Clustering Topic Model for mining behaviour in video , 2009, 2009 IEEE 12th International Conference on Computer Vision.