Modelling salient visual dynamics in videos

Automatic video annotation is a critical step for content-based video retrieval and browsing. Detecting the focus of interest in video frames automatically can benefit the tedious manual labeling process. However, producing an appropriate extent of visually salient regions in video sequences is a challenging task. Therefore, in this work, we propose a novel approach for modeling dynamic visual attention based on spatiotemporal analysis. Our model first detects salient points in three-dimensional video volumes, and then uses the points as seeds to search the extent of salient regions in a novel motion attention map. To determine the extent of attended regions, we use the maximum entropy in the spatial domain to analyze the dynamics derived by spatiotemporal analysis. Our experiment results show that the proposed dynamic visual attention model achieves high precision value of 70% and reveals its robustness in successive video volumes.

[1]  Ivan Laptev,et al.  On Space-Time Interest Points , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[2]  W. James,et al.  The Principles of Psychology. , 1983 .

[3]  Shan Li,et al.  An Efficient Spatiotemporal Attention Model and Its Application to Shot Matching , 2007, IEEE Transactions on Circuits and Systems for Video Technology.

[4]  Kuo-Chin Fan,et al.  A motion-tolerant dissolve detection algorithm , 2005, IEEE Transactions on Multimedia.

[5]  C. Koch,et al.  Computational modelling of visual attention , 2001, Nature Reviews Neuroscience.

[6]  G. A. Cottrell,et al.  Identified amine-containing neurones and their synaptic connexions , 1977, Neuroscience.

[7]  Christof Koch,et al.  A Model of Saliency-Based Visual Attention for Rapid Scene Analysis , 2009 .

[8]  Sankar K. Pal,et al.  Entropy: a new definition and its applications , 1991, IEEE Trans. Syst. Man Cybern..

[9]  Brian Scassellati,et al.  A Behavioral Analysis of Computational Models of Visual Attention , 2007, International Journal of Computer Vision.

[10]  Bärbel Mertsching,et al.  Integration of Static and Dynamic Scene Features Guiding Visual Attention , 1997, DAGM-Symposium.

[11]  Bruno Arnaldi,et al.  A new application for saliency maps: synthetic vision of autonomous actors , 2003, Proceedings 2003 International Conference on Image Processing (Cat. No.03CH37429).

[12]  Wen-Huang Cheng,et al.  A user-attention based focus detection framework and its applications , 2003, Fourth International Conference on Information, Communications and Signal Processing, 2003 and the Fourth Pacific Rim Conference on Multimedia. Proceedings of the 2003 Joint.

[13]  D. Spalding The Principles of Psychology , 1873, Nature.

[14]  Mubarak Shah,et al.  Visual attention detection in video sequences using spatiotemporal cues , 2006, MM '06.

[15]  Christopher G. Harris,et al.  A Combined Corner and Edge Detector , 1988, Alvey Vision Conference.

[16]  Takeo Kanade,et al.  An Iterative Image Registration Technique with an Application to Stereo Vision , 1981, IJCAI.

[17]  Laurent Itti,et al.  An Integrated Model of Top-Down and Bottom-Up Attention for Optimizing Detection Speed , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[18]  Hong-Yuan Mark Liao,et al.  Shot Change Detection Based on the Reynolds Transport Theorem , 2001, IEEE Pacific Rim Conference on Multimedia.