Spatio-temporal attention model for video content analysis

This paper presents a new model of human attention that allows salient areas to be extracted from video frames. As automatic understanding of video semantic content is still far from being achieved, attention model tends to mimic the focus of the human visual system. Most existing approaches extract the saliency of images in order to be used in multiple applications but they are not compared to human perception. The model described here is achieved by the fusion of a static model inspired by the human system and a model of moving object detection. The static model is divided into two steps: a "retinal" filtering followed by a "cortical" decomposition. The moving object detection is carried out by a compensation of camera motion. Then we compare the attention model output for different videos with human judgment. A psychophysical experiment is proposed to compare the model with visual human perception and to validate it. The experimental results indicate that the model achieves about 88% of precision. This shows the usefulness of the scheme and its potential in future applications.

[1]  Jean-Marc Odobez,et al.  Robust Multiresolution Estimation of Parametric Motion Models , 1995, J. Vis. Commun. Image Represent..

[2]  Xing Xie,et al.  Image Adaptation Based on Attention Model for Small-Form-Factor Device , 2003, MMM.

[3]  C. Koch,et al.  Target detection using saliency-based attention , 2000 .

[4]  Wen-Huang Cheng,et al.  A user-attention based focus detection framework and its applications , 2003, Fourth International Conference on Information, Communications and Signal Processing, 2003 and the Fourth Pacific Rim Conference on Multimedia. Proceedings of the 2003 Joint.

[5]  Jeanny Hérault,et al.  NATURAL SCENE PERCEPTION: VISUAL ATTRACTORS AND IMAGES PROCESSING , 2002 .

[6]  John A. Bullinaria,et al.  Connectionist Models of Cognition and Perception. , 2002 .

[7]  S Ullman,et al.  Shifts in selective visual attention: towards the underlying neural circuitry. , 1985, Human neurobiology.

[8]  Bärbel Mertsching,et al.  Integration of Static and Dynamic Scene Features Guiding Visual Attention , 1997, DAGM-Symposium.

[9]  W. Beaudot,et al.  Sensory coding in the vertebrate retina: towards an adaptive control of visual sensitivity. , 1996, Network.

[10]  Bruno Arnaldi,et al.  A new application for saliency maps: synthetic vision of autonomous actors , 2003, Proceedings 2003 International Conference on Image Processing (Cat. No.03CH37429).