Video Précis: Highlighting Diverse Aspects of Videos

Summarizing long unconstrained videos is gaining importance in surveillance, web-based video browsing, and video-archival applications. Summarizing a video requires one to identify key aspects that contain the essence of the video. In this paper, we propose an approach that optimizes two criteria that a video summary should embody. The first criterion, “coverage,” requires that the summary be able to represent the original video well. The second criterion, “diversity,” requires that the elements of the summary be as distinct from each other as possible. Given a user-specified summary length, we propose a cost function to measure the quality of a summary. The problem of generating a précis is then reduced to a combinatorial optimization problem of minimizing the proposed cost function. We propose an efficient method to solve the optimization problem. We demonstrate through experiments (on KTH data, unconstrained skating video, a surveillance video, and a YouTube home video) that optimizing the proposed criterion results in meaningful video summaries over a wide range of scenarios. Summaries thus generated are then evaluated using both quantitative measures and user studies.

[1]  Yang Wang,et al.  Unsupervised Discovery of Action Classes , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[2]  Edward H. Adelson,et al.  Representing moving images with layers , 1994, IEEE Trans. Image Process..

[3]  Denis Simakov,et al.  Summarizing visual data using bidirectional similarity , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[4]  Regunathan Radhakrishnan,et al.  Video summarization using descriptors of motion activity: A motion activity based approach to key-frame extraction from video shots , 2001, J. Electronic Imaging.

[5]  Shih-Fu Chang,et al.  Structure analysis of soccer video with domain knowledge and hidden Markov models , 2004, Pattern Recognit. Lett..

[6]  Yael Pritch,et al.  Webcam Synopsis: Peeking Around the World , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[7]  Stephen W. Smoliar,et al.  Video parsing, retrieval and browsing: an integrated and content-based solution , 1997, MULTIMEDIA '95.

[8]  Alan Hanjalic,et al.  An integrated scheme for automated video abstraction based on unsupervised cluster-validity analysis , 1999, IEEE Trans. Circuits Syst. Video Technol..

[9]  Behzad Shahraray,et al.  Automatic generation of pictorial transcripts of video programs , 1995, Electronic Imaging.

[10]  Jitendra Malik,et al.  Normalized Cuts and Image Segmentation , 2000, IEEE Trans. Pattern Anal. Mach. Intell..

[11]  Evimaria Terzi,et al.  ManyAspects: a system for highlighting diverse concepts in documents , 2008, Proc. VLDB Endow..

[12]  Fatih Murat Porikli,et al.  Region Covariance: A Fast Descriptor for Detection and Classification , 2006, ECCV.

[13]  Jeho Nam,et al.  Video abstract of video , 1999, 1999 IEEE Third Workshop on Multimedia Signal Processing (Cat. No.99TH8451).

[14]  Chong-Wah Ngo,et al.  Video Summarization , 2009, Encyclopedia of Database Systems.

[15]  Ba Tu Truong,et al.  Video abstraction: A systematic review and classification , 2007, TOMCCAP.

[16]  Aggelos K. Katsaggelos,et al.  MINMAX optimal video summarization , 2005, IEEE Transactions on Circuits and Systems for Video Technology.

[17]  Regunathan Radhakrishnan,et al.  Generation of sports highlights using motion activity in combination with a common audio feature extraction framework , 2003, Proceedings 2003 International Conference on Image Processing (Cat. No.03CH37429).

[18]  P. Anandan,et al.  Mosaic based representations of video sequences and their applications , 1995, Proceedings of IEEE International Conference on Computer Vision.

[19]  Barbara Caputo,et al.  Recognizing human actions: a local SVM approach , 2004, Proceedings of the 17th International Conference on Pattern Recognition, 2004. ICPR 2004..

[20]  Stefano Soatto,et al.  Dynamic Textures , 2003, International Journal of Computer Vision.

[21]  Lihi Zelnik-Manor,et al.  Event-based analysis of video , 2001, Proceedings of the 2001 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. CVPR 2001.

[22]  Bobby Bodenheimer,et al.  Synthesis and evaluation of linear motion transitions , 2008, TOGS.

[23]  A. Murat Tekalp,et al.  Hierarchical temporal video segmentation and content characterization , 1997, Other Conferences.

[24]  A. Murat Tekalp,et al.  Multiscale content extraction and representation for video indexing , 1997, Other Conferences.

[25]  Rama Chellappa,et al.  Unsupervised view and rate invariant clustering of video sequences q , 2009 .

[26]  Ying Li,et al.  An Overview of Video Abstraction Techniques , 2001 .

[27]  Stephen W. Smoliar,et al.  An integrated system for content-based video retrieval and browsing , 1997, Pattern Recognit..

[28]  Delbert Dueck,et al.  Clustering by Passing Messages Between Data Points , 2007, Science.

[29]  Stephen W. Smoliar,et al.  Content based video indexing and retrieval , 1994, IEEE MultiMedia.

[30]  Jhing-Fa Wang,et al.  A Novel Video Summarization Based on Mining the Story-Structure and Semantic Relations Among Concept Entities , 2009, IEEE Transactions on Multimedia.

[31]  Harry W. Agius,et al.  Video summarisation: A conceptual framework and survey of the state of the art , 2008, J. Vis. Commun. Image Represent..

[32]  Yueting Zhuang,et al.  Adaptive key frame extraction using unsupervised clustering , 1998, Proceedings 1998 International Conference on Image Processing. ICIP98 (Cat. No.98CB36269).

[33]  Jitendra Malik,et al.  Normalized cuts and image segmentation , 1997, Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[34]  Regunathan Radhakrishnan,et al.  Video Summarization Using Mpeg-7 Motion Activity and Audio Descriptors , 2003 .

[35]  Juan Carlos Niebles,et al.  Unsupervised Learning of Human Action Categories Using Spatial-Temporal Words , 2006, BMVC.

[36]  Elizabeth D. Liddy,et al.  Advances in Automatic Text Summarization , 2001, Information Retrieval.

[37]  Ariel Shamir,et al.  Improved seam carving for video retargeting , 2008, ACM Trans. Graph..

[38]  Shih-Fu Chang,et al.  Real-time personalized sports video filtering and summarization , 2001, MULTIMEDIA '01.

[39]  Regunathan Radhakrishnan,et al.  A Content-Adaptive Analysis and Representation Framework for Audio Event Discovery from "Unscripted" Multimedia , 2006, EURASIP J. Adv. Signal Process..

[40]  Nuno Vasconcelos,et al.  A spatiotemporal motion model for video summarization , 1998, Proceedings. 1998 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (Cat. No.98CB36231).

[41]  Jung Hwan Oh,et al.  Video Abstraction , 2009, Encyclopedia of Database Systems.

[42]  David S. Doermann,et al.  Video summarization by curve simplification , 1998, MULTIMEDIA '98.

[43]  James W. Davis,et al.  The Recognition of Human Movement Using Temporal Templates , 2001, IEEE Trans. Pattern Anal. Mach. Intell..

[44]  Inderjit S. Dhillon,et al.  Kernel k-means: spectral clustering and normalized cuts , 2004, KDD.

[45]  Serge J. Belongie,et al.  Behavior recognition via sparse spatio-temporal features , 2005, 2005 IEEE International Workshop on Visual Surveillance and Performance Evaluation of Tracking and Surveillance.

[46]  Ahmed K. Elmagarmid,et al.  InsightVideo: toward hierarchical video content organization for efficient browsing, summarization and retrieval , 2005, IEEE Transactions on Multimedia.