Automated localization of affective objects and actions in images via caption text-cum-eye gaze analysis

We propose a novel framework to localize and label affective objects and actions in images through a combination of text, visual and gaze-based analysis. Human gaze provides useful cues to infer locations and interactions of affective objects. While concepts (labels) associated with an image can be determined from its caption, we demonstrate localization of these concepts upon learning from a statistical affect model for world concepts. The affect model is derived from non-invasively acquired fixation patterns on labeled images, and guides localization of affective objects (faces, reptiles) and actions (look, read) from fixations in unlabeled images. Experimental results obtained on a database of 500 images confirm the effectiveness and promise of the proposed approach.

[1]  B. S. Manjunath,et al.  Color image segmentation , 1999, Proceedings. 1999 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (Cat. No PR00149).

[2]  Neil C. Rowe Finding and Labeling the Subject of a Captioned Depictive Natural Photograph , 2002, IEEE Trans. Knowl. Data Eng..

[3]  P. Lang International affective picture system (IAPS) : affective ratings of pictures and instruction manual , 2005 .

[4]  Asha Iyer,et al.  Components of bottom-up gaze allocation in natural images , 2005, Vision Research.

[5]  Antonio Torralba,et al.  LabelMe: A Database and Web-Based Tool for Image Annotation , 2008, International Journal of Computer Vision.

[6]  Gustavo Carneiro,et al.  Supervised Learning of Semantic Classes for Image Annotation and Retrieval , 2007, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[7]  James Ze Wang,et al.  Image retrieval: Ideas, influences, and trends of the new age , 2008, CSUR.

[8]  Pietro Perona,et al.  Some Objects Are More Equal Than Others: Measuring and Predicting Importance , 2008, ECCV.

[9]  Anthony J. Hornof,et al.  The effects of semantic grouping on visual search , 2008, CHI Extended Abstracts.

[10]  P. Perona,et al.  Objects predict fixations better than early saliency. , 2008, Journal of vision.

[11]  Cedric Nishan Canagarajah,et al.  A Multicue Bayesian State Estimator for Gaze Prediction in Open Signed Video , 2009, IEEE Transactions on Multimedia.