论文信息 - Perception-Guided Multimodal Feature Fusion for Photo Aesthetics Assessment

Perception-Guided Multimodal Feature Fusion for Photo Aesthetics Assessment

Photo aesthetic quality evaluation is a challenging task in multimedia and computer vision fields. Conventional approaches suffer from the following three drawbacks: 1) the deemphasized role of semantic content that is many times more important than low-level visual features in photo aesthetics; 2) the difficulty to optimally fuse low-level and high-level visual cues in photo aesthetics evaluation; and 3) the absence of a sequential viewing path in the existing models, as humans perceive visually salient regions sequentially when viewing a photo. To solve these problems, we propose a new aesthetic descriptor that mimics humans sequentially perceiving visually/semantically salient regions in a photo. In particular, a weakly supervised learning paradigm is developed to project the local aesthetic descriptors (graphlets in this work) into a low-dimensional semantic space. Thereafter, each graphlet can be described by multiple types of visual features, both at low-level and in high-level. Since humans usually perceive only a few salient regions in a photo, a sparsity-constrained graphlet ranking algorithm is proposed that seamlessly integrates both the low-level and the high-level visual cues. Top-ranked graphlets are those visually/semantically prominent graphlets in a photo. They are sequentially linked into a path that simulates the process of humans actively viewing. Finally, we learn a probabilistic aesthetic measure based on such actively viewing paths (AVPs) from the training photos that are marked as aesthetically pleasing by multiple users. Experimental results show that: 1) the AVPs are 87.65% consistent with real human gaze shifting paths, as verified by the eye-tracking data; and 2) our photo aesthetic measure outperforms many of its competitors.

[1] Yongdong Zhang,et al. Multiview Spectral Embedding , 2010, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).

[2] Chun Chen,et al. Active Learning Based on Locally Linear Reconstruction , 2011, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[3] Gabriela Csurka,et al. Assessing the aesthetic quality of photographs using generic image descriptors , 2011, 2011 International Conference on Computer Vision.

[4] Pietro Perona,et al. Graph-Based Visual Saliency , 2006, NIPS.

[5] Mubarak Shah,et al. A framework for photo-quality assessment and enhancement based on visual aesthetics , 2010, ACM Multimedia.

[6] Masashi Nishiyama,et al. Aesthetic quality classification of photographs based on color harmony , 2011, CVPR 2011.

[7] Ming Ouhyoung,et al. Personalized photograph ranking and selection system , 2010, ACM Multimedia.

[8] Xiao Liu,et al. Probabilistic Graphlet Transfer for Photo Cropping , 2013, IEEE Transactions on Image Processing.

[9] John K. Tsotsos,et al. Saliency, attention, and visual search: an information theoretic approach. , 2009, Journal of vision.

[10] Ali Borji,et al. Boosting bottom-up and top-down visual features for saliency estimation , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[11] James Ze Wang,et al. Studying Aesthetics in Photographic Images Using a Computational Approach , 2006, ECCV.

[12] Yan Ke,et al. The Design of High-Level Features for Photo Quality Assessment , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[13] Sebastian Nowozin,et al. On feature combination for multiclass object classification , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[14] Pascal Fua,et al. SLIC Superpixels Compared to State-of-the-Art Superpixel Methods , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[15] Yong Yu,et al. Robust Subspace Segmentation by Low-Rank Representation , 2010, ICML.

[16] Bingbing Ni,et al. Learning to photograph , 2010, ACM Multimedia.

[17] Dacheng Tao,et al. Subspaces Indexing Model on Grassmann Manifold for Image Search , 2011, IEEE Transactions on Image Processing.

[18] Mingli Song,et al. Automatic image cropping using sparse coding , 2011, The First Asian Conference on Pattern Recognition.

[19] Yoichi Sato,et al. Sensation-based photo cropping , 2009, ACM Multimedia.

[20] Xiaogang Wang,et al. Content-based photo quality assessment , 2011, 2011 International Conference on Computer Vision.

[21] Markus A. Stricker,et al. Similarity of color images , 1995, Electronic Imaging.

[22] Christof Koch,et al. Image Signature: Highlighting Sparse Salient Regions , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[23] Vicente Ordonez,et al. High level describable attributes for predicting aesthetics and interestingness , 2011, CVPR 2011.

[24] Xiao Liu,et al. Semi-supervised Node Splitting for Random Forest Construction , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[25] Emmanuel J. Candès,et al. A Singular Value Thresholding Algorithm for Matrix Completion , 2008, SIAM J. Optim..

[26] Cordelia Schmid,et al. Beyond Bags of Features: Spatial Pyramid Matching for Recognizing Natural Scene Categories , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[27] Pietro Perona,et al. A Bayesian hierarchical model for learning natural scene categories , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[28] W. Chu. Studying Aesthetics in Photographic Images Using a Computational Approach , 2013 .

[29] Benjamin Z. Yao,et al. Introduction to a Large-Scale General Purpose Ground Truth Database: Methodology, Annotation Tool and Benchmarks , 2007, EMMCVPR.

[30] Bill Triggs,et al. Histograms of oriented gradients for human detection , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[31] Naila Murray,et al. AVA: A large-scale database for aesthetic visual analysis , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[32] Xuelong Li,et al. Actively Learning Human Gaze Shifting Paths for Semantics-Aware Photo Cropping , 2014, IEEE Transactions on Image Processing.