Image description with a goal: Building efficient discriminating expressions for images

Many works in computer vision attempt to solve different tasks such as object detection, scene recognition or attribute detection, either separately or as a joint problem. In recent years, there has been a growing interest in combining the results from these different tasks in order to provide a textual description of the scene. However, when describing a scene, there are many items that can be mentioned. If we include all the objects, relationships, and attributes that exist in the image, the description would be extremely long and not convey a true understanding of the image. We present a novel approach to ranking the importance of the items to be described. Specifically, we focus on the task of discriminating one image from a group of others. We investigate the factors that contribute to the most efficient description that achieves this task. We also provide a quantitative method to measure the description quality for this specific task using data from human subjects and show that our method achieves better results than baseline methods.

[1]  Chih-Jen Lin,et al.  LIBSVM: A library for support vector machines , 2011, TIST.

[2]  Cordelia Schmid,et al.  Applying Color Names to Image Description , 2007, 2007 IEEE International Conference on Image Processing.

[3]  Antonio Torralba,et al.  Recognizing indoor scenes , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[4]  Cyrus Rashtchian,et al.  Every Picture Tells a Story: Generating Sentences from Images , 2010, ECCV.

[5]  Liang Lin,et al.  I2T: Image Parsing to Text Description , 2010, Proceedings of the IEEE.

[6]  Christof Koch,et al.  A Model of Saliency-Based Visual Attention for Rapid Scene Analysis , 2009 .

[7]  Yejin Choi,et al.  Baby talk: Understanding and generating simple image descriptions , 2011, CVPR 2011.

[8]  Robert Dale,et al.  Computational Interpretations of the Gricean Maxims in the Generation of Referring Expressions , 1995, Cogn. Sci..

[9]  Emiel Krahmer,et al.  Computational Generation of Referring Expressions: A Survey , 2012, CL.

[10]  Ali Farhadi,et al.  Describing objects by their attributes , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[11]  Tsuhan Chen,et al.  Object color categorization in surveillance videos , 2011, 2011 18th IEEE International Conference on Image Processing.

[12]  Pietro Perona,et al.  Some Objects Are More Equal Than Others: Measuring and Predicting Importance , 2008, ECCV.

[13]  Yejin Choi,et al.  Composing Simple Image Descriptions using Web-scale N-grams , 2011, CoNLL.