A representative-based framework for parsing and summarizing events in surveillance videos

This paper presents a novel representative-based framework for parsing and summarizing events in long surveillance videos. The proposed framework first extracts object blob sequences and utilizes them to represent events in a surveillance video. Then, a sequence filtering strategy is introduced which detects and eliminates noisy blob sequences based on their spatial and temporal characteristics. After clustering the blob sequences into different event types, we further introduce a representative-based model which integrates location, size, and appearance cues to select a representative blob sequence from each cluster, and creates a snapshot image for each representative blob sequence. Based on the blob-sequence clustering and representative-sequence selection results, two schemes are further proposed to summarize contents of the input surveillance video: (1) type-based scheme which shows snapshot images to users and creates a summary video for a specific event cluster according to user-selected snapshot image; (2) representative-based scheme which creates a summary video only with the extracted representative blob sequences. Experimental results show that our approach can create more effective and well-organized summarization results compared with the state-of-the-art methods.

[1]  Rui Caseiro,et al.  High-Speed Tracking with Kernelized Correlation Filters , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[2]  S. Chiba,et al.  Dynamic programming algorithm optimization for spoken word recognition , 1978 .

[3]  Ning Xu,et al.  Intra-and-Inter-Constraint-Based Video Enhancement Based on Piecewise Tone Mapping , 2013, IEEE Transactions on Circuits and Systems for Video Technology.

[4]  Yongdong Zhang,et al.  Multi-task deep visual-semantic embedding for video thumbnail selection , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[5]  Yu Zhou,et al.  Improved human head and shoulder detection with local main gradient and tracklets-based feature , 2014, Signal and Information Processing Association Annual Summit and Conference (APSIPA), 2014 Asia-Pacific.

[6]  Yong Jae Lee,et al.  Discovering important people and objects for egocentric video summarization , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[7]  Ba Tu Truong,et al.  Video abstraction: A systematic review and classification , 2007, TOMCCAP.

[8]  Jenq-Neng Hwang,et al.  An integrated scheme for object-based video abstraction , 2000, ACM Multimedia.

[9]  Yael Pritch,et al.  Video Synopsis and Indexing , 2007 .

[10]  Yael Pritch,et al.  This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2008 1 Non-Chronological Video , 2022 .

[11]  Yongwei Nie,et al.  Compact Video Synopsis via Global Spatiotemporal Optimization , 2013, IEEE Trans. Vis. Comput. Graph..

[12]  Hongyuan Zha,et al.  Unsupervised Trajectory Clustering via Adaptive Multi-kernel-Based Shrinkage , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[13]  Jiwen Lu,et al.  Summarizing surveillance videos with local-patch-learning-based abnormality detection, blob sequence optimization, and type-based synopsis , 2015, Neurocomputing.