Measuring and Predicting Visual Importance of Similar Objects

Similar objects are ubiquitous and abundant in both natural and artificial scenes. Determining the visual importance of several similar objects in a complex photograph is a challenge for image understanding algorithms. This study aims to define the importance of similar objects in an image and to develop a method that can select the most important instances for an input image from multiple similar objects. This task is challenging because multiple objects must be compared without adequate semantic information. This challenge is addressed by building an image database and designing an interactive system to measure object importance from human observers. This ground truth is used to define a range of features related to the visual importance of similar objects. Then, these features are used in learning-to-rank and random forest to rank similar objects in an image. Importance predictions were validated on 5,922 objects. The most important objects can be identified automatically. The factors related to composition (e.g., size, location, and overlap) are particularly informative, although clarity and color contrast are also important. We demonstrate the usefulness of similar object importance on various applications, including image retargeting, image compression, image re-attentionizing, image admixture, and manipulation of blindness images.

[1]  Li Fei-Fei,et al.  ImageNet: A large-scale hierarchical image database , 2009, CVPR.

[2]  Michal Irani,et al.  Detecting Irregularities in Images and in Video , 2005, Tenth IEEE International Conference on Computer Vision (ICCV'05) Volume 1.

[3]  Stan Sclaroff,et al.  Saliency Detection: A Boolean Map Approach , 2013, 2013 IEEE International Conference on Computer Vision.

[4]  Harry Shum,et al.  Paint selection , 2009, ACM Trans. Graph..

[5]  Haim Schweitzer,et al.  A Dual-Bound Algorithm for Very Fast and Exact Template Matching , 2011, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[6]  Bingbing Ni,et al.  Image Re-Attentionizing , 2013, IEEE Transactions on Multimedia.

[7]  Antonio Torralba,et al.  LabelMe: A Database and Web-Based Tool for Image Annotation , 2008, International Journal of Computer Vision.

[8]  Olga Sorkine-Hornung,et al.  Robust Image Retargeting via Axis‐Aligned Deformation , 2012, Comput. Graph. Forum.

[9]  Trevor Darrell,et al.  Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation , 2013, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[10]  Adam Finkelstein,et al.  A no-reference metric for evaluating the quality of motion deblurring , 2013, ACM Trans. Graph..

[11]  Kristen Grauman,et al.  Learning the Relative Importance of Objects from Tagged Images for Retrieval and Cross-Modal Search , 2011, International Journal of Computer Vision.

[12]  Li-Yi Wei,et al.  Discrete element textures , 2011, ACM Trans. Graph..

[13]  R. A. Bradley,et al.  RANK ANALYSIS OF INCOMPLETE BLOCK DESIGNS THE METHOD OF PAIRED COMPARISONS , 1952 .

[14]  Shi-Min Hu,et al.  ImageAdmixture: Putting Together Dissimilar Objects from Groups , 2012, IEEE Transactions on Visualization and Computer Graphics.

[15]  Dumitru Erhan,et al.  Scalable Object Detection Using Deep Neural Networks , 2013, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[16]  C. Lawrence Zitnick,et al.  Bringing Semantics into Focus Using Visual Abstraction , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[17]  M. Kendall A NEW MEASURE OF RANK CORRELATION , 1938 .

[18]  Huchuan Lu,et al.  Saliency Detection via Graph-Based Manifold Ranking , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[19]  Karl Stratos,et al.  Understanding and predicting importance in images , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[20]  Pengfei Xu,et al.  Lazy selection , 2012, ACM Trans. Graph..

[21]  J. Marden Analyzing and Modeling Rank Data , 1996 .

[22]  Shi-Min Hu,et al.  Global contrast based salient region detection , 2011, CVPR 2011.

[23]  M. Kendall,et al.  The Problem of $m$ Rankings , 1939 .

[24]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[25]  F. Attneave Some informational aspects of visual perception. , 1954, Psychological review.

[26]  Shi-Min Hu,et al.  RepFinder: finding approximately repeated scene elements for image editing , 2010, ACM Trans. Graph..

[27]  Laura A. Dabbish,et al.  Labeling images with a computer game , 2004, AAAI Spring Symposium: Knowledge Collection from Volunteer Contributors.

[28]  Shi-Min Hu,et al.  Change Blindness Images , 2013, IEEE Transactions on Visualization and Computer Graphics.

[29]  Pietro Perona,et al.  Measuring and Predicting Object Importance , 2011, International Journal of Computer Vision.

[30]  David A. Freedman,et al.  Statistical Models: Theory and Practice: References , 2005 .

[31]  Bolei Zhou,et al.  Learning Deep Features for Scene Recognition using Places Database , 2014, NIPS.

[32]  P. Perona,et al.  Objects predict fixations better than early saliency. , 2008, Journal of vision.

[33]  Terence Sim,et al.  Defocus map estimation from a single image , 2011, Pattern Recognit..

[34]  Tie-Yan Liu,et al.  Generalization analysis of listwise learning-to-rank algorithms , 2009, ICML '09.

[35]  Thorsten Joachims,et al.  Optimizing search engines using clickthrough data , 2002, KDD.

[36]  Tong-Yee Lee,et al.  Summarization-Based Image Resizing by Intelligent Object Carving , 2014, IEEE Transactions on Visualization and Computer Graphics.

[37]  Hua Huang,et al.  RepSnapping: Efficient Image Cutout for Repeated Scene Elements , 2011, Comput. Graph. Forum.

[38]  Jian Sun,et al.  Saliency Optimization from Robust Background Detection , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[39]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[40]  Yael Pritch,et al.  Shift-map image editing , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[41]  Xing Mei,et al.  SimLocator: robust locator of similar objects in images , 2013, The Visual Computer.

[42]  Michael Dorr,et al.  Large-Scale Optimization of Hierarchical Features for Saliency Prediction in Natural Images , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[43]  Weiming Dong,et al.  Fast Multi-Operator Image Resizing and Evaluation , 2012, Journal of Computer Science and Technology.

[44]  George Baciu,et al.  Detecting, Grouping, and Structure Inference for Invariant Repetitive Patterns in Images , 2013, IEEE Transactions on Image Processing.

[45]  Frédo Durand,et al.  Learning to predict where humans look , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[46]  Laurent Itti,et al.  Interesting objects are visually salient. , 2008, Journal of vision.

[47]  D. Hunter MM algorithms for generalized Bradley-Terry models , 2003 .

[48]  Laurent Itti,et al.  Automatic foveation for video compression using a neurobiological model of visual attention , 2004, IEEE Transactions on Image Processing.