Near-lossless semantic video summarization and its applications to video analysis

The ever increasing volume of video content on the Web has created profound challenges for developing efficient indexing and search techniques to manage video data. Conventional techniques such as video compression and summarization strive for the two commonly conflicting goals of low storage and high visual and semantic fidelity. With the goal of balancing both video compression and summarization, this article presents a novel approach, called Near-Lossless Semantic Summarization (NLSS), to summarize a video stream with the least high-level semantic information loss by using an extremely small piece of metadata. The summary consists of compressed image and audio streams, as well as the metadata for temporal structure and motion information. Although at a very low compression rate (around &frac140; of H.264 baseline, where traditional compression techniques can hardly preserve an acceptable visual fidelity), the proposed NLSS still can be applied to many video-oriented tasks, such as visualization, indexing and browsing, duplicate detection, concept detection, and so on. We evaluate the NLSS on TRECVID and other video collections, and demonstrate that it is a powerful tool for significantly reducing storage consumption, while keeping high-level semantic fidelity.

[1]  Paul Over,et al.  The trecvid 2008 BBC rushes summarization evaluation , 2008, TVS '08.

[2]  Ajay Luthra,et al.  Overview of the H.264/AVC video coding standard , 2003, SPIE Optics + Photonics.

[3]  Tao Mei,et al.  VideoSense: a contextual video advertising system , 2007, ACM Multimedia.

[4]  Noboru Babaguchi,et al.  Automatic personalized video abstraction for sports videos using metadata , 2008, Multimedia Tools and Applications.

[5]  Hung-Khoon Tan,et al.  Real-Time Near-Duplicate Elimination for Web Video Search With Content and Context , 2009, IEEE Transactions on Multimedia.

[6]  Tao Mei,et al.  Home Video Visual Quality Assessment With Spatiotemporal Factors , 2007, IEEE Transactions on Circuits and Systems for Video Technology.

[7]  Lie Lu,et al.  Digital Object Identifier (DOI) 10.1007/s00530-002-0065-0 Multimedia Systems , 2003 .

[8]  Jenq-Neng Hwang,et al.  Object-based video abstraction for video surveillance systems , 2002, IEEE Trans. Circuits Syst. Video Technol..

[9]  John R. Smith,et al.  Large-scale concept ontology for multimedia , 2006, IEEE MultiMedia.

[10]  Shingo Uchihashi,et al.  An interactive comic book presentation for exploring video , 2000, CHI.

[11]  Tao Mei,et al.  VideoSense: A Contextual In-Video Advertising System , 2009, IEEE Transactions on Circuits and Systems for Video Technology.

[12]  Chong-Wah Ngo,et al.  Scale-Rotation Invariant Pattern Entropy for Keypoint-Based Near-Duplicate Detection , 2009, IEEE Transactions on Image Processing.

[13]  Nicu Sebe,et al.  Content-based multimedia information retrieval: State of the art and challenges , 2006, TOMCCAP.

[14]  Malcolm Slaney,et al.  MACH1: nonuniform time-scale modification of speech , 1998, Proceedings of the 1998 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP '98 (Cat. No.98CH36181).

[15]  Atreyi Kankanhalli,et al.  Automatic partitioning of full-motion video , 1993, Multimedia Systems.

[16]  Gary Marchionini,et al.  Dynamic key frame presentation techniques for augmenting video browsing , 1998, AVI '98.

[17]  G LoweDavid,et al.  Distinctive Image Features from Scale-Invariant Keypoints , 2004 .

[18]  Patrick Bouthemy,et al.  A unified approach to shot change detection and camera motion characterization , 1999, IEEE Trans. Circuits Syst. Video Technol..

[19]  Meng Wang,et al.  MSRA-USTC-SJTU at TRECVID 2007: High-Level Feature Extraction and Search , 2007, TRECVID.

[20]  Thomas Wiegand,et al.  Draft ITU-T recommendation and final draft international standard of joint video specification , 2003 .

[21]  Tao Mei,et al.  Structure and event mining in sports video with efficient mosaic , 2008, Multimedia Tools and Applications.

[22]  Marcel Worring,et al.  Content-Based Image Retrieval at the End of the Early Years , 2000, IEEE Trans. Pattern Anal. Mach. Intell..

[23]  Hung-Khoon Tan,et al.  Near-Duplicate Keyframe Identification With Interest Point Matching and Pattern Learning , 2007, IEEE Transactions on Multimedia.

[24]  Jesús Bescós,et al.  Content-Driven Adaptation of On-Line Video , 2007, 2007 International Workshop on Content-Based Multimedia Indexing.

[25]  Tao Mei,et al.  CrowdReranking: exploring multiple search engines for visual search reranking , 2009, SIGIR.

[26]  Marcel Worring,et al.  The challenge problem for automated detection of 101 semantic concepts in multimedia , 2006, MM '06.

[27]  Hyung-Myung Kim,et al.  Efficient camera motion characterization for MPEG video indexing , 2000, 2000 IEEE International Conference on Multimedia and Expo. ICME2000. Proceedings. Latest Advances in the Fast Changing World of Multimedia (Cat. No.00TH8532).

[28]  Ruud M. Bolle,et al.  Comparison of sequence matching techniques for video copy detection , 2001, IS&T/SPIE Electronic Imaging.

[29]  Newton Lee,et al.  ACM Transactions on Multimedia Computing, Communications and Applications (ACM TOMCCAP) , 2007, CIE.

[30]  Tao Mei,et al.  Community Discovery from Movie and Its Application to Poster Generation , 2011, MMM.

[31]  Wei-Hao Lin,et al.  Clever clustering vs. simple speed-up for summarizing rushes , 2007, TVS '07.

[32]  B. S. Manjunath,et al.  Video Annotation Through Search and Graph Reinforcement Mining , 2010, IEEE Transactions on Multimedia.

[33]  Yi-Ping Phoebe Chen,et al.  Sports video summarization using highlights and play-breaks , 2003, MIR '03.

[34]  Xiaofang Zhou,et al.  Video matching using binary signature , 2005, 2005 International Symposium on Intelligent Signal Processing and Communication Systems.

[35]  Tao Mei,et al.  Scalable clip-based near-duplicate video detection with ordinal measure , 2010, CIVR '10.

[36]  Shih-Fu Chang,et al.  Reranking Methods for Visual Search , 2007, IEEE MultiMedia.

[37]  Warnakulasuriya Anil Chandana Fernando,et al.  Automatic detection of fade-in and fade-out in video sequences , 1999, ISCAS'99. Proceedings of the 1999 IEEE International Symposium on Circuits and Systems VLSI (Cat. No.99CH36349).

[38]  Tao Mei,et al.  Near-lossless video summarization , 2009, MM '09.

[39]  Itu-T Video coding for low bitrate communication , 1996 .

[40]  Chong-Wah Ngo,et al.  Representations of Keypoint-Based Semantic Concept Detection: A Comprehensive Study , 2010, IEEE Transactions on Multimedia.

[41]  Tao Mei,et al.  Video collage: presenting a video sequence using a single image , 2008, The Visual Computer.

[42]  HuaXian-Sheng,et al.  Near-lossless semantic video summarization and its applications to video analysis , 2013 .

[43]  Mohan S. Kankanhalli,et al.  Automatic summarization of music videos , 2006, TOMCCAP.

[44]  Michal Irani,et al.  Video indexing based on mosaic representations , 1998, Proc. IEEE.

[45]  S LewMichael,et al.  Content-based multimedia information retrieval , 2006 .

[46]  Ba Tu Truong,et al.  Video abstraction: A systematic review and classification , 2007, TOMCCAP.

[47]  Ajay Luthra,et al.  Overview of the H.264/AVC video coding standard , 2003, IEEE Trans. Circuits Syst. Video Technol..