High level describable attributes for predicting aesthetics and interestingness

With the rise in popularity of digital cameras, the amount of visual data available on the web is growing exponentially. Some of these pictures are extremely beautiful and aesthetically pleasing, but the vast majority are uninteresting or of low quality. This paper demonstrates a simple, yet powerful method to automatically select high aesthetic quality images from large image collections. Our aesthetic quality estimation method explicitly predicts some of the possible image cues that a human might use to evaluate an image and then uses them in a discriminative approach. These cues or high level describable image attributes fall into three broad types: 1) compositional attributes related to image layout or configuration, 2) content attributes related to the objects or scene types depicted, and 3) sky-illumination attributes related to the natural lighting conditions. We demonstrate that an aesthetics classifier trained on these describable attributes can provide a significant improvement over baseline methods for predicting human quality judgments. We also demonstrate our method for predicting the “interestingness” of Flickr photos, and introduce a novel problem of estimating query specific “interestingness”.

[1]  C. Koch,et al.  Computational modelling of visual attention , 2001, Nature Reviews Neuroscience.

[2]  Paul A. Viola,et al.  Robust Real-time Object Detection , 2001 .

[3]  G LoweDavid,et al.  Distinctive Image Features from Scale-Invariant Keypoints , 2004 .

[4]  Jingrui He,et al.  Classification of Digital Photos Taken by Photographers or Home Users , 2004, PCM.

[5]  Antonio Torralba,et al.  Modeling the Shape of the Scene: A Holistic Representation of the Spatial Envelope , 2001, International Journal of Computer Vision.

[6]  Paul A. Viola,et al.  Robust Real-Time Face Detection , 2001, International Journal of Computer Vision.

[7]  Wei-Ying Ma,et al.  Learning No-Reference Quality Metric by Examples , 2005, 11th International Multimedia Modelling Conference.

[8]  Pietro Perona,et al.  A Bayesian hierarchical model for learning natural scene categories , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[9]  Alexei A. Efros,et al.  Geometric context from a single image , 2005, Tenth IEEE International Conference on Computer Vision (ICCV'05) Volume 1.

[10]  Fatih Murat Porikli,et al.  Integral histogram: a fast way to extract histograms in Cartesian spaces , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[11]  Yan Ke,et al.  The Design of High-Level Features for Photo Quality Assessment , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[12]  Cordelia Schmid,et al.  Beyond Bags of Features: Spatial Pyramid Matching for Recognizing Natural Scene Categories , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[13]  James Ze Wang,et al.  Studying Aesthetics in Photographic Images Using a Computational Approach , 2006, ECCV.

[14]  David A. Forsyth,et al.  Animals on the Web , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[15]  Andrew Zisserman,et al.  Learning Visual Attributes , 2007, NIPS.

[16]  Östen Axelsson Towards a Psychology of Photography: Dimensions Underlying Aesthetic Appeal of Photographs , 2007, Perceptual and motor skills.

[17]  Learning to Detect A Salient Object , 2007, CVPR.

[18]  Nanning Zheng,et al.  Learning to Detect a Salient Object , 2011, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[19]  James Ze Wang,et al.  Algorithmic inferencing of aesthetics and emotion in natural images: An exposition , 2008, 2008 15th IEEE International Conference on Image Processing.

[20]  Xiaoou Tang,et al.  Photo and Video Quality Evaluation: Focusing on the Subject , 2008, ECCV.

[21]  Christoph H. Lampert,et al.  Learning to detect unseen object classes by between-class attribute transfer , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[22]  Ali Farhadi,et al.  Describing objects by their attributes , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[23]  Shree K. Nayar,et al.  Attribute and simile classifiers for face verification , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[24]  Rongrong Ji,et al.  Photo assessment based on computational visual attention model , 2009, ACM Multimedia.

[25]  Karen B. Schloss,et al.  The role of spatial composition in preference for color pairs , 2010 .

[26]  Charless C. Fowlkes,et al.  Exploring aesthetic principles of spatial composition through stock photography , 2010 .