A Multiple Visual Models Based Perceptive Analysis Framework for Multilevel Video Summarization

In this paper, we propose a generic framework to human perception analysis in video understanding based on multiple visual cues. Video features that prominently influence human perception, such as motion, contrast, special scenes, and statistical rhythm, are first extracted and modeled. A perception curve that corresponds to human perception change is then constructed from these individual models using linear or priority based fusion approach. As an important application of the perceptive analysis framework, a feasible scheme for video summarization is implemented in order to demonstrate the validity, robustness, and generality of the proposed framework. The frames that correspond to the peak points in these individual models and the fusion curve are extracted as multilevel summarizations that include video keywords, keyframes, and dynamic segments. The subjective evaluations from a supplementary volunteer study on video summarizations indicate that the analysis framework is effective and offer a promising approach to semantic video management, access, and understanding

[1]  C. Koch,et al.  Computational modelling of visual attention , 2001, Nature Reviews Neuroscience.

[2]  B. S. Manjunath,et al.  Region of interest extraction and virtual camera control based on panoramic video capturing , 2005, IEEE Transactions on Multimedia.

[3]  Guizhong Liu,et al.  A Robust, Efficient, and Fast Global Motion Estimation Method from MPEG Compressed Video , 2002, IEEE Pacific Rim Conference on Multimedia.

[4]  Mohammed Ghanbari,et al.  Key components for an advanced segmentation system , 2002, IEEE Trans. Multim..

[5]  Christof Koch,et al.  Comparison of feature combination strategies for saliency-based visual attention systems , 1999, Electronic Imaging.

[6]  Sang Uk Lee,et al.  Efficient video indexing scheme for content-based retrieval , 1999, IEEE Trans. Circuits Syst. Video Technol..

[7]  Stephen W. Smoliar,et al.  Content based video indexing and retrieval , 1994, IEEE MultiMedia.

[8]  HongJiang Zhang,et al.  Contrast-based image attention analysis by using fuzzy growing , 2003, MULTIMEDIA '03.

[9]  D. Spalding The Principles of Psychology , 1873, Nature.

[10]  Shih-Fu Chang,et al.  A utility framework for the automatic generation of audio-visual skims , 2002, MULTIMEDIA '02.

[11]  Andrew Heybey,et al.  I/Browse: the Bellcore video library tool kit , 1996, Electronic Imaging.

[12]  Anoop Gupta,et al.  Time-compression: systems concerns, usage, and benefits , 1999, CHI '99.

[13]  Seong-Whan Lee,et al.  Text extraction in MPEG compressed video for content-based indexing , 2000, Proceedings 15th International Conference on Pattern Recognition. ICPR-2000.

[14]  Daniel P. Huttenlocher,et al.  Comparing Images Using the Hausdorff Distance , 1993, IEEE Trans. Pattern Anal. Mach. Intell..

[15]  Weisong Qi,et al.  Fast motion estimation for video coding , 2005 .

[16]  Rainer Lienhart Dynamic video summarization of home video , 1999, Electronic Imaging.

[17]  Frédéric Dufaux,et al.  Efficient, robust, and fast global motion estimation for video coding , 2000, IEEE Trans. Image Process..

[18]  Stephen W. Smoliar,et al.  Video parsing and browsing using compressed data , 1995, Multimedia Tools and Applications.

[19]  David S. Doermann,et al.  Automatic text detection and tracking in digital video , 2000, IEEE Trans. Image Process..

[20]  Sanjeev R. Kulkarni,et al.  Rapid estimation of camera motion from compressed video with application to video annotation , 2000, IEEE Trans. Circuits Syst. Video Technol..

[21]  Tanveer F. Syeda-Mahmood,et al.  Learning video browsing behavior and its application in the generation of video previews , 2001, MULTIMEDIA '01.

[22]  Boon-Lock Yeo,et al.  Rapid scene analysis on compressed video , 1995, IEEE Trans. Circuits Syst. Video Technol..

[23]  Wayne H. Wolf,et al.  Key frame selection by motion analysis , 1996, 1996 IEEE International Conference on Acoustics, Speech, and Signal Processing Conference Proceedings.

[24]  Tore Fjällbrant,et al.  A direct computation of DCT coefficients for a signal block taken from two adjacent blocks , 1991, IEEE Trans. Signal Process..

[25]  HongJiang Zhang,et al.  A model of motion attention for video skimming , 2002, Proceedings. International Conference on Image Processing.

[26]  Michael J. Black,et al.  Summarization of videotaped presentations: automatic analysis of motion and gesture , 1998, IEEE Trans. Circuits Syst. Video Technol..

[27]  Christof Koch,et al.  A Model of Saliency-Based Visual Attention for Rapid Scene Analysis , 2009 .

[28]  Shih-Fu Chang,et al.  Manipulation and Compositing of MC-DCT Compressed Video , 1995, IEEE J. Sel. Areas Commun..

[29]  Wolfgang Effelsberg,et al.  Abstracting Digital Movies Automatically , 1996, J. Vis. Commun. Image Represent..

[30]  Shih-Fu Chang,et al.  A highly efficient system for automatic face region detection in MPEG video , 1997, IEEE Trans. Circuits Syst. Video Technol..

[31]  Anil K. Jain,et al.  Automatic caption localization in compressed video , 1999, Proceedings 1999 International Conference on Image Processing (Cat. 99CH36348).

[32]  King Ngi Ngan,et al.  Automatic segmentation of moving objects for video object plane generation , 1998, IEEE Trans. Circuits Syst. Video Technol..

[33]  David S. Doermann,et al.  Video summarization by curve simplification , 1998, MULTIMEDIA '98.

[34]  Soo-Chang Pei,et al.  Efficient MPEG Compressed Video Analysis Using Macroblock Type Information , 1999, IEEE Trans. Multim..

[35]  J. Norman Two visual systems and two theories of perception: An attempt to reconcile the constructivist and ecological approaches. , 2001, The Behavioral and brain sciences.

[36]  Lie Lu,et al.  A generic framework of user attention model and its application in video summarization , 2005, IEEE Trans. Multim..

[37]  Jeho Nam,et al.  Dynamic video summarization and visualization , 1999, MULTIMEDIA '99.

[38]  Alan Hanjalic,et al.  An integrated scheme for automated video abstraction based on unsupervised cluster-validity analysis , 1999, IEEE Trans. Circuits Syst. Video Technol..

[39]  Stephen W. Smoliar,et al.  An integrated system for content-based video retrieval and browsing , 1997, Pattern Recognit..