Making computers look the way we look: exploiting visual attention for image understanding

Human Visual attention (HVA) is an important strategy to focus on specific information while observing and understanding visual stimuli. HVA involves making a series of fixations on select locations while performing tasks such as object recognition, scene understanding, etc. We present one of the first works that combines fixation information with automated concept detectors to (i) infer abstract image semantics, and (ii) enhance performance of object detectors. We develop visual attention-based models that sample fixation distributions and fixation transition distributions in regions-of-interest (ROI) to infer abstract semantics such as expressive faces and interactions (such as look, read, etc.). We also exploit eye-gaze information to deduce possible locations and scale of salient concepts and aid state-of-art detectors. A 18% performance increase with over 80% reduction in computational time for a state-of-art object detector [4].

[1]  J. Henderson Human gaze control during real-world scene perception , 2003, Trends in Cognitive Sciences.

[2]  Paul A. Viola,et al.  Robust Real-Time Face Detection , 2001, International Journal of Computer Vision.

[3]  Asha Iyer,et al.  Components of bottom-up gaze allocation in natural images , 2005, Vision Research.

[4]  David Salesin,et al.  Gaze-based interaction for semi-automatic photo cropping , 2006, CHI.

[5]  Cedric Nishan Canagarajah,et al.  Towards efficient context-specific video coding based on gaze-tracking analysis , 2007, TOMCCAP.

[6]  Pietro Perona,et al.  Some Objects Are More Equal Than Others: Measuring and Predicting Importance , 2008, ECCV.

[7]  P. Perona,et al.  Objects predict fixations better than early saliency. , 2008, Journal of vision.

[8]  Nicu Sebe,et al.  Image saliency by isocentric curvedness and color , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[9]  Tat-Seng Chua,et al.  Automated localization of affective objects and actions in images via caption text-cum-eye gaze analysis , 2009, ACM Multimedia.

[10]  Loong Fah Cheong,et al.  Active segmentation with fixation , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[11]  Meredith Ringel Morris,et al.  What do you see when you're surfing?: using eye tracking to predict salient regions of web pages , 2009, CHI.

[12]  Frédo Durand,et al.  Learning to predict where humans look , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[13]  David A. McAllester,et al.  Object Detection with Discriminatively Trained Part Based Models , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[14]  Qiang Ji,et al.  In the Eye of the Beholder: A Survey of Models for Eyes and Gaze , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[15]  Harish Katti,et al.  An Eye Fixation Database for Saliency Detection in Images , 2010, ECCV.