A generic framework of user attention model and its application in video summarization

Due to the information redundancy of video, automatically extracting essential video content is one of key techniques for accessing and managing large video library. In this paper, we present a generic framework of a user attention model, which estimates the attentions viewers may pay to video contents. As human attention is an effective and efficient mechanism for information prioritizing and filtering, user attention model provides an effective approach to video indexing based on importance ranking. In particular, we define viewer attention through multiple sensory perceptions, i.e. visual and aural stimulus as well as partly semantic understanding. Also, a set of modeling methods for visual and aural attentions are proposed. As one of important applications of user attention model, a feasible solution of video summarization, without fully semantic understanding of video content as well as complex heuristic rules, is implemented to demonstrate the effectiveness, robustness, and generality of the user attention model. The promising results from the user study on video summarization indicate that the user attention model is an alternative way to video understanding.

[1]  J. Deutsch Perception and Communication , 1958, Nature.

[2]  D. Spalding The Principles of Psychology , 1873, Nature.

[3]  Andreas Girgensohn,et al.  Time-Constrained Keyframe Selection Technique , 1999, Proceedings IEEE International Conference on Multimedia Computing and Systems.

[4]  Frédéric Dufaux,et al.  Key Frame Selection to Represent a Video , 2000, ICIP.

[5]  Lie Lu,et al.  Content-based audio segmentation using support vector machines , 2001, IEEE International Conference on Multimedia and Expo, 2001. ICME 2001..

[6]  Shumeet Baluja,et al.  Expectation-based selective attention for visual monitoring and control of a robot vehicle , 1997, Robotics Auton. Syst..

[7]  Jeho Nam,et al.  Video abstract of video , 1999, 1999 IEEE Third Workshop on Multimedia Signal Processing (Cat. No.99TH8451).

[8]  Jenq-Neng Hwang,et al.  An integrated scheme for object-based video abstraction , 2000, ACM Multimedia.

[9]  Lie Lu,et al.  A robust audio classification and segmentation method , 2001, MULTIMEDIA '01.

[10]  J. Deutsch,et al.  Attention: Some theoretical considerations. , 1963 .

[11]  Bärbel Mertsching,et al.  Integration of Static and Dynamic Scene Features Guiding Visual Attention , 1997, DAGM-Symposium.

[12]  Wayne H. Wolf,et al.  Key frame selection by motion analysis , 1996, 1996 IEEE International Conference on Acoustics, Speech, and Signal Processing Conference Proceedings.

[13]  Shih-Fu Chang,et al.  A utility framework for the automatic generation of audio-visual skims , 2002, MULTIMEDIA '02.

[14]  Michael G. Christel,et al.  Evolving video skims into useful multimedia abstractions , 1998, CHI.

[15]  Ying Li,et al.  Salient region detection and tracking in video , 2003, 2003 International Conference on Multimedia and Expo. ICME '03. Proceedings (Cat. No.03TH8698).

[16]  Yueting Zhuang,et al.  Adaptive key frame extraction using unsupervised clustering , 1998, Proceedings 1998 International Conference on Image Processing. ICIP98 (Cat. No.98CB36269).

[17]  John K. Tsotsos,et al.  Modeling Visual Attention via Selective Tuning , 1995, Artif. Intell..

[18]  Xin Liu,et al.  Video summarization using singular value decomposition , 2000, Proceedings IEEE Conference on Computer Vision and Pattern Recognition. CVPR 2000 (Cat. No.PR00662).

[19]  P ? ? ? ? ? ? ? % ? ? ? ? , 1991 .

[21]  Alessandro Neri,et al.  Automatic key frame selection using a wavelet-based approach , 1999, Optics & Photonics.

[22]  S. Yantis,et al.  Visual attention: control, representation, and time course. , 1997, Annual review of psychology.

[23]  Anoop Gupta,et al.  Time-compression: systems concerns, usage, and benefits , 1999, CHI '99.

[24]  C. Koch,et al.  Some reflections on visual awareness. , 1990, Cold Spring Harbor symposia on quantitative biology.

[25]  H. Bourgeois,et al.  [Contrast sensitivity]. , 1987, L'Annee therapeutique et clinique en ophtalmologie.

[26]  Anoop Gupta,et al.  Auto-summarization of audio-video presentations , 1999, MULTIMEDIA '99.

[27]  HongJiang Zhang,et al.  Contrast-based image attention analysis by using fuzzy growing , 2003, MULTIMEDIA '03.

[28]  Albert Ali Salah,et al.  A Selective Attention-Based Method for Visual Pattern Recognition with Application to Handwritten Digit Recognition and Face Recognition , 2002, IEEE Trans. Pattern Anal. Mach. Intell..

[29]  HongJiang Zhang,et al.  A novel motion-based representation for video mining , 2003, 2003 International Conference on Multimedia and Expo. ICME '03. Proceedings (Cat. No.03TH8698).

[30]  Harry Shum,et al.  Statistical Learning of Multi-view Face Detection , 2002, ECCV.

[31]  Jeho Nam,et al.  Dynamic video summarization and visualization , 1999, MULTIMEDIA '99.

[32]  Alan Hanjalic,et al.  An integrated scheme for automated video abstraction based on unsupervised cluster-validity analysis , 1999, IEEE Trans. Circuits Syst. Video Technol..

[33]  Eric Horvitz,et al.  Models of attention in computing and communication , 2003, Commun. ACM.

[34]  Anthony Stefanidis,et al.  Summarizing video datasets in the spatiotemporal domain , 2000, Proceedings 11th International Workshop on Database and Expert Systems Applications.

[35]  Xian-Sheng Hua,et al.  An Attention-Based Decision Fusion Scheme for Multimedia Information Retrieval , 2004, PCM.

[36]  Thierry Pun,et al.  Attentive mechanisms for dynamic and static scene analysis , 1995 .

[37]  S Ullman,et al.  Shifts in selective visual attention: towards the underlying neural circuitry. , 1985, Human neurobiology.

[38]  Subutai Ahmad,et al.  VISIT: A Neural Model of Covert Visual Attention , 1991, NIPS.

[39]  Michael A. Smith,et al.  Video skimming and characterization through the combination of image and language understanding techniques , 1997, Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[40]  Tanveer F. Syeda-Mahmood,et al.  Learning video browsing behavior and its application in the generation of video previews , 2001, MULTIMEDIA '01.

[41]  HongJiang Zhang,et al.  A model of motion attention for video skimming , 2002, Proceedings. International Conference on Image Processing.

[42]  A Treisman,et al.  Feature analysis in early vision: evidence from search asymmetries. , 1988, Psychological review.

[43]  Stephen W. Smoliar,et al.  An integrated system for content-based video retrieval and browsing , 1997, Pattern Recognit..

[44]  Xavier Binefa,et al.  An EM algorithm for video summarization, generative model approach , 2001, Proceedings Eighth IEEE International Conference on Computer Vision. ICCV 2001.