Multicamera Video Summarization from Optimal Reconstruction

We propose a principled approach to video summarization using optimal reconstruction as a metric to guide the creation of the summary output. The spatio-temporal video patches included in the summary are viewed as observations about the local motion of the original input video and are chosen to minimize the reconstruction error of the missing observations under a set of learned predictive models. The method is demonstrated using fixed-viewpoint video sequences and shown to generalize to multiple camera systems with disjoint views, which can share activity already summarized in one view to inform the summary of another. The results show that this approach can significantly reduce or even eliminate the inclusion of patches in the summary that contain activities from the video that are already expected based on other summary patches, leading to a more concise output.

[1]  Pradeep Sen,et al.  Video Carving , 2008, Eurographics.

[2]  Jianbo Shi,et al.  Detecting unusual activity in video , 2004, Proceedings of the 2004 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2004. CVPR 2004..

[3]  W. Eric L. Grimson,et al.  Trajectory analysis and semantic region modeling using a nonparametric Bayesian model , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[4]  Yael Pritch,et al.  Clustered Synopsis of Surveillance Video , 2009, 2009 Sixth IEEE International Conference on Advanced Video and Signal Based Surveillance.

[5]  Larry J. Eshelman The CHC Adaptive Search Algo-rithm , 1991 .

[6]  Denis Simakov,et al.  Summarizing visual data using bidirectional similarity , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[7]  Luc Van Gool,et al.  Hunting Nessie - Real-time abnormality detection from webcams , 2009, 2009 IEEE 12th International Conference on Computer Vision Workshops, ICCV Workshops.

[8]  W. Eric L. Grimson,et al.  Correspondence-Free Activity Analysis and Scene Modeling in Multiple Camera Views , 2009, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[9]  Ehud Rivlin,et al.  Robust Real-Time Unusual Event Detection using Multiple Fixed-Location Monitors , 2008, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[10]  H. Akaike A new look at the statistical model identification , 1974 .

[11]  Larry J. Eshelman,et al.  The CHC Adaptive Search Algorithm: How to Have Safe Search When Engaging in Nontraditional Genetic Recombination , 1990, FOGA.

[12]  Shaogang Gong,et al.  Modelling activity global temporal dependencies using Time Delayed Probabilistic Graphical Model , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[13]  Shaogang Gong,et al.  Multi-camera activity correlation analysis , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[14]  Jianping Fan,et al.  Exploring video content structure for hierarchical summarization , 2004, Multimedia Systems.

[15]  Gian Luca Foresti,et al.  Trajectory-Based Anomalous Event Detection , 2008, IEEE Transactions on Circuits and Systems for Video Technology.

[16]  Pietro Perona,et al.  Self-Tuning Spectral Clustering , 2004, NIPS.