Fusion of visual attention cues by machine learning

A new computational scheme for visual attention modeling is proposed. It adopts both low-level and high-level features to predict visual attention from a video signal and fuses the features by using machine learning. We show that such a scheme is more robust than those using purely single level features. Unlike conventional techniques, our scheme is able to avoid perceptual mismatch between the estimated saliency and the actual human fixation. We show that selecting the representative training samples according to the fixation distribution improves the efficacy of regressive training. Experimental results are shown to demonstrate the advantages of the proposed scheme.

[1]  Ingvar Claesson,et al.  Face Detection using Local SMQT Features and Split up Snow Classifier , 2007, 2007 IEEE International Conference on Acoustics, Speech and Signal Processing - ICASSP '07.

[2]  J. Findlay Saccade Target Selection During Visual Search , 1997, Vision Research.

[3]  Laurent Itti,et al.  Realistic avatar eye and head animation using a neurobiological model of visual attention , 2004, SPIE Optics + Photonics.

[4]  Zhaoping Li,et al.  Feature-specific interactions in salience from combined feature contrasts: evidence for a bottom-up saliency map in V1. , 2007, Journal of vision.

[5]  Frédo Durand,et al.  Learning to predict where humans look , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[6]  O. Meur,et al.  Predicting visual fixations on video based on low-level visual features , 2007, Vision Research.

[7]  G. Rhodes,et al.  Are you always on my mind? A review of how face perception and attention interact , 2007, Neuropsychologia.

[8]  Iain D. Gilchrist,et al.  Visual correlates of fixation selection: effects of scale and time , 2005, Vision Research.

[9]  Chih-Jen Lin,et al.  LIBSVM: A library for support vector machines , 2011, TIST.

[10]  Katsumi Aoki,et al.  Recent development of flow visualization , 2004, J. Vis..

[11]  Patrick Le Callet,et al.  A coherent computational approach to model bottom-up visual attention , 2006, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[12]  Bärbel Mertsching,et al.  Fast and Robust Generation of Feature Maps for Region-Based Visual Attention , 2008, IEEE Transactions on Image Processing.

[13]  Patrick Le Callet,et al.  A spatio-temporal model of the selective human visual attention , 2005, IEEE International Conference on Image Processing 2005.