Perception-Guided Multimodal Feature Fusion for Photo Aesthetics Assessment

Photo aesthetic quality evaluation is a challenging task in multimedia and computer vision fields. Conventional approaches suffer from the following three drawbacks: 1) the deemphasized role of semantic content that is many times more important than low-level visual features in photo aesthetics; 2) the difficulty to optimally fuse low-level and high-level visual cues in photo aesthetics evaluation; and 3) the absence of a sequential viewing path in the existing models, as humans perceive visually salient regions sequentially when viewing a photo. To solve these problems, we propose a new aesthetic descriptor that mimics humans sequentially perceiving visually/semantically salient regions in a photo. In particular, a weakly supervised learning paradigm is developed to project the local aesthetic descriptors (graphlets in this work) into a low-dimensional semantic space. Thereafter, each graphlet can be described by multiple types of visual features, both at low-level and in high-level. Since humans usually perceive only a few salient regions in a photo, a sparsity-constrained graphlet ranking algorithm is proposed that seamlessly integrates both the low-level and the high-level visual cues. Top-ranked graphlets are those visually/semantically prominent graphlets in a photo. They are sequentially linked into a path that simulates the process of humans actively viewing. Finally, we learn a probabilistic aesthetic measure based on such actively viewing paths (AVPs) from the training photos that are marked as aesthetically pleasing by multiple users. Experimental results show that: 1) the AVPs are 87.65% consistent with real human gaze shifting paths, as verified by the eye-tracking data; and 2) our photo aesthetic measure outperforms many of its competitors.

[1]  Yongdong Zhang,et al.  Multiview Spectral Embedding , 2010, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).

[2]  Chun Chen,et al.  Active Learning Based on Locally Linear Reconstruction , 2011, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[3]  Gabriela Csurka,et al.  Assessing the aesthetic quality of photographs using generic image descriptors , 2011, 2011 International Conference on Computer Vision.

[4]  Pietro Perona,et al.  Graph-Based Visual Saliency , 2006, NIPS.

[5]  Mubarak Shah,et al.  A framework for photo-quality assessment and enhancement based on visual aesthetics , 2010, ACM Multimedia.

[6]  Masashi Nishiyama,et al.  Aesthetic quality classification of photographs based on color harmony , 2011, CVPR 2011.

[7]  Ming Ouhyoung,et al.  Personalized photograph ranking and selection system , 2010, ACM Multimedia.

[8]  Xiao Liu,et al.  Probabilistic Graphlet Transfer for Photo Cropping , 2013, IEEE Transactions on Image Processing.

[9]  John K. Tsotsos,et al.  Saliency, attention, and visual search: an information theoretic approach. , 2009, Journal of vision.

[10]  Ali Borji,et al.  Boosting bottom-up and top-down visual features for saliency estimation , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[11]  James Ze Wang,et al.  Studying Aesthetics in Photographic Images Using a Computational Approach , 2006, ECCV.

[12]  Yan Ke,et al.  The Design of High-Level Features for Photo Quality Assessment , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[13]  Sebastian Nowozin,et al.  On feature combination for multiclass object classification , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[14]  Pascal Fua,et al.  SLIC Superpixels Compared to State-of-the-Art Superpixel Methods , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[15]  Yong Yu,et al.  Robust Subspace Segmentation by Low-Rank Representation , 2010, ICML.

[16]  Bingbing Ni,et al.  Learning to photograph , 2010, ACM Multimedia.

[17]  Dacheng Tao,et al.  Subspaces Indexing Model on Grassmann Manifold for Image Search , 2011, IEEE Transactions on Image Processing.

[18]  Mingli Song,et al.  Automatic image cropping using sparse coding , 2011, The First Asian Conference on Pattern Recognition.

[19]  Yoichi Sato,et al.  Sensation-based photo cropping , 2009, ACM Multimedia.

[20]  Xiaogang Wang,et al.  Content-based photo quality assessment , 2011, 2011 International Conference on Computer Vision.

[21]  Markus A. Stricker,et al.  Similarity of color images , 1995, Electronic Imaging.

[22]  Christof Koch,et al.  Image Signature: Highlighting Sparse Salient Regions , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[23]  Vicente Ordonez,et al.  High level describable attributes for predicting aesthetics and interestingness , 2011, CVPR 2011.

[24]  Xiao Liu,et al.  Semi-supervised Node Splitting for Random Forest Construction , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[25]  Emmanuel J. Candès,et al.  A Singular Value Thresholding Algorithm for Matrix Completion , 2008, SIAM J. Optim..

[26]  Cordelia Schmid,et al.  Beyond Bags of Features: Spatial Pyramid Matching for Recognizing Natural Scene Categories , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[27]  Pietro Perona,et al.  A Bayesian hierarchical model for learning natural scene categories , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[28]  W. Chu Studying Aesthetics in Photographic Images Using a Computational Approach , 2013 .

[29]  Benjamin Z. Yao,et al.  Introduction to a Large-Scale General Purpose Ground Truth Database: Methodology, Annotation Tool and Benchmarks , 2007, EMMCVPR.

[30]  Bill Triggs,et al.  Histograms of oriented gradients for human detection , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[31]  Naila Murray,et al.  AVA: A large-scale database for aesthetic visual analysis , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[32]  Xuelong Li,et al.  Actively Learning Human Gaze Shifting Paths for Semantics-Aware Photo Cropping , 2014, IEEE Transactions on Image Processing.