InsightVideo: toward hierarchical video content organization for efficient browsing, summarization and retrieval

Hierarchical video browsing and feature-based video retrieval are two standard methods for accessing video content. Very little research, however, has addressed the benefits of integrating these two methods for more effective and efficient video content access. In this paper, we introduce InsightVideo, a video analysis and retrieval system, which joins video content hierarchy, hierarchical browsing and retrieval for efficient video access. We propose several video processing techniques to organize the content hierarchy of the video. We first apply a camera motion classification and key-frame extraction strategy that operates in the compressed domain to extract video features. Then, shot grouping, scene detection and pairwise scene clustering strategies are applied to construct the video content hierarchy. We introduce a video similarity evaluation scheme at different levels (key-frame, shot, group, scene, and video.) By integrating the video content hierarchy and the video similarity evaluation scheme, hierarchical video browsing and retrieval are seamlessly integrated for efficient content access. We construct a progressive video retrieval scheme to refine user queries through the interactions of browsing and retrieval. Experimental results and comparisons of camera motion classification, key-frame extraction, scene detection, and video retrieval are presented to validate the effectiveness and efficiency of the proposed algorithms and the performance of the system.

[1]  K. Wakimoto,et al.  Efficient and Effective Querying by Image Content , 1994 .

[2]  Alex Pentland,et al.  Photobook: Content-based manipulation of image databases , 1996, International Journal of Computer Vision.

[3]  Boon-Lock Yeo,et al.  Time-constrained clustering for segmentation of video into story units , 1996, Proceedings of 13th International Conference on Pattern Recognition.

[4]  Michael J. Witbrock,et al.  Story segmentation and detection of commercials in broadcast news video , 1998, Proceedings IEEE International Forum on Research and Technology Advances in Digital Libraries -ADL'98-.

[5]  Alex Pentland,et al.  Video and Image Semantics: Advanced Tools for Telecommunications , 1994, IEEE Multim..

[6]  Boon-Lock Yeo,et al.  Video visualization for compact presentation and fast browsing of pictorial content , 1997, IEEE Trans. Circuits Syst. Video Technol..

[7]  HongJiang Zhang,et al.  Automatic video scene extraction by shot grouping , 2000, Proceedings 15th International Conference on Pattern Recognition. ICPR-2000.

[8]  Svetha Venkatesh,et al.  Qualitative estimation of camera motion parameters from video sequences , 1997, Pattern Recognition.

[9]  Jianping Fan,et al.  Automatic Scene Detection in News Program by Integrating Visual Feature and Rules , 2001, IEEE Pacific Rim Conference on Multimedia.

[10]  Stan Z. Li,et al.  Face recognition using the nearest feature line method , 1999, IEEE Trans. Neural Networks.

[11]  Chong-Wah Ngo,et al.  Motion characterization by temporal slices analysis , 2000, Proceedings IEEE Conference on Computer Vision and Pattern Recognition. CVPR 2000 (Cat. No.PR00662).

[12]  Thomas S. Huang,et al.  Constructing table-of-content for videos , 1999, Multimedia Systems.

[13]  C.-C. Jay Kuo,et al.  Rule-based video classification system for basketball video indexing , 2000, MULTIMEDIA '00.

[14]  John R. Smith,et al.  VideoZoom Spatio-Temporal Video Browser , 1999, IEEE Trans. Multim..

[15]  Sanjeev R. Kulkarni,et al.  Rapid estimation of camera motion from compressed video with application to video annotation , 2000, IEEE Trans. Circuits Syst. Video Technol..

[16]  Ahmed K. Elmagarmid,et al.  WVTDB - A Semantic Content-Based Video Database System on the World Wide Web , 1998, IEEE Trans. Knowl. Data Eng..

[17]  Takeo Kanade,et al.  Intelligent Access to Digital Video: Informedia Project , 1996, Computer.

[18]  John R. Smith,et al.  Searching for Images and Videos on the World-Wide Web , 1999 .

[19]  Regunathan Radhakrishnan,et al.  Video summarization using descriptors of motion activity: A motion activity based approach to key-frame extraction from video shots , 2001, J. Electronic Imaging.

[20]  Stan Z. Li,et al.  Face recognition based on nearest linear combinations , 1998, Proceedings. 1998 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (Cat. No.98CB36231).

[21]  Yueting Zhuang,et al.  Adaptive key frame extraction using unsupervised clustering , 1998, Proceedings 1998 International Conference on Image Processing. ICIP98 (Cat. No.98CB36269).

[22]  Chitra Dorai,et al.  Perceived visual motion descriptors from MPEG-2 for content-based HDTV annotation and retrieval , 1999, 1999 IEEE Third Workshop on Multimedia Signal Processing (Cat. No.99TH8451).

[23]  Anil K. Jain,et al.  Algorithms for Clustering Data , 1988 .

[24]  Mohamed Abdel-Mottaleb,et al.  Content-based video retrieval by example video clip , 1997, Electronic Imaging.

[25]  B. S. Manjunath,et al.  Introduction to MPEG-7: Multimedia Content Description Interface , 2002 .

[26]  Chong-Wah Ngo,et al.  On clustering and retrieval of video shots , 2001, MULTIMEDIA '01.

[27]  Shih-Fu Chang,et al.  VideoQ: an automated content based video search system using visual cues , 1997, MULTIMEDIA '97.

[28]  Li Zhao,et al.  Key-frame extraction and shot retrieval using nearest feature line (NFL) , 2000, MULTIMEDIA '00.

[29]  Shih-Fu Chang,et al.  Efficient Techniques for Feature-Based Image/Video Access and Manipulation , 1996, Data Processing Clinic.

[30]  Chong-Wah Ngo,et al.  On clustering and retrieval of video shots through temporal slices analysis , 2002, IEEE Trans. Multim..

[31]  Edoardo Ardizzone,et al.  JACOB: just a content-based query system for video databases , 1996, 1996 IEEE International Conference on Acoustics, Speech, and Signal Processing Conference Proceedings.

[32]  Wei Xiong,et al.  Efficient Scene Change Detection and Camera Motion Annotation for Video Classification , 1998, Comput. Vis. Image Underst..

[33]  Ramesh C. Jain,et al.  Direct Computation of the Focus of Expansion , 1983, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[34]  Regunathan Radhakrishnan,et al.  Motion activity-based extraction of key-frames from video shots , 2002, Proceedings. International Conference on Image Processing.

[35]  Stephen W. Smoliar,et al.  Video parsing, retrieval and browsing: an integrated and content-based solution , 1997, MULTIMEDIA '95.

[36]  Michael Stonebraker,et al.  Chabot: Retrieval from a Relational Database of Images , 1995, Computer.

[37]  Shih-Fu Chang,et al.  VisualSEEk: a fully automated content-based image query system , 1997, MULTIMEDIA '96.

[38]  Amarnath Gupta,et al.  Virage video engine , 1997, Electronic Imaging.

[39]  Jianping Fan,et al.  MultiView: Multilevel video content representation and retrieval , 2001, J. Electronic Imaging.

[40]  Clement T. Yu,et al.  Techniques and Systems for Image and Video Retrieval , 1999, IEEE Trans. Knowl. Data Eng..

[41]  Howard D. Wactlar,et al.  Informedia - Search and Summarization in the Video Medium , 2000 .

[42]  Ajay Divakaran,et al.  MPEG-7 visual motion descriptors , 2001, IEEE Trans. Circuits Syst. Video Technol..

[43]  Wolfgang Effelsberg,et al.  VisualGREP: a systematic method to compare and retrieve video sequences , 1997, Electronic Imaging.

[44]  Azriel Rosenfeld,et al.  Compressed Domain Video Segmentation , 1996 .

[45]  Ramesh C. Jain,et al.  Knowledge-guided parsing in video databases , 1993, Electronic Imaging.

[46]  Donald A. Adjeroh,et al.  A distance measure for video sequence similarity matching , 1998, Proceedings International Workshop on Multi-Media Database Management Systems (Cat. No.98TB100249).

[47]  Jianping Fan,et al.  Spatiotemporal segmentation for compact video representation , 2001, Signal Process. Image Commun..

[48]  Boon-Lock Yeo,et al.  Rapid scene analysis on compressed video , 1995, IEEE Trans. Circuits Syst. Video Technol..

[49]  Richard Szeliski,et al.  Video mosaics for virtual environments , 1996, IEEE Computer Graphics and Applications.

[50]  Stephen W. Smoliar,et al.  An integrated system for content-based video retrieval and browsing , 1997, Pattern Recognit..

[51]  Alan Hanjalic,et al.  An integrated scheme for automated video abstraction based on unsupervised cluster-validity analysis , 1999, IEEE Trans. Circuits Syst. Video Technol..

[52]  Yannis Avrithis,et al.  Efficient content representation in MPEG video databases , 1998, Proceedings. IEEE Workshop on Content-Based Access of Image and Video Libraries (Cat. No.98EX173).

[53]  Behzad Shahraray,et al.  Automatic generation of pictorial transcripts of video programs , 1995, Electronic Imaging.

[54]  John P. Oakley,et al.  Storage and Retrieval for Image and Video Databases , 1993 .

[55]  Atreyi Kankanhalli,et al.  Automatic partitioning of full-motion video , 1993, Multimedia Systems.

[56]  Svetha Venkatesh,et al.  On the automatic indexing of cricket using camera motion parameters , 2002, Proceedings. IEEE International Conference on Multimedia and Expo.

[57]  Shih-Fu Chang,et al.  Clustering methods for video browsing and annotation , 1996, Electronic Imaging.

[58]  Yueting Zhuang,et al.  A new approach to retrieve video by example video clip , 1999, MULTIMEDIA '99.

[59]  Wayne H. Wolf,et al.  Key frame selection by motion analysis , 1996, 1996 IEEE International Conference on Acoustics, Speech, and Signal Processing Conference Proceedings.

[60]  Edoardo Ardizzone,et al.  Video indexing using MPEG motion compensation vectors , 1999, Proceedings IEEE International Conference on Multimedia Computing and Systems.