RAPID: Rating Pictorial Aesthetics using Deep Learning

Effective visual features are essential for computational aesthetic quality rating systems. Existing methods used machine learning and statistical modeling techniques on handcrafted features or generic image descriptors. A recently-published large-scale dataset, the AVA dataset, has further empowered machine learning based approaches. We present the RAPID (RAting PIctorial aesthetics using Deep learning) system, which adopts a novel deep neural network approach to enable automatic feature learning. The central idea is to incorporate heterogeneous inputs generated from the image, which include a global view and a local view, and to unify the feature learning and classifier training using a double-column deep convolutional neural network. In addition, we utilize the style attributes of images to help improve the aesthetic quality categorization accuracy. Experimental results show that our approach significantly outperforms the state of the art on the AVA dataset.

[1]  Florent Perronnin,et al.  Learning beautiful (and ugly) attributes , 2013, BMVC.

[2]  Gabriela Csurka,et al.  Assessing the aesthetic quality of photographs using generic image descriptors , 2011, 2011 International Conference on Computer Vision.

[3]  Antonio Torralba,et al.  Modeling the Shape of the Scene: A Holistic Representation of the Spatial Envelope , 2001, International Journal of Computer Vision.

[4]  Yan Ke,et al.  The Design of High-Level Features for Photo Quality Assessment , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[5]  Naila Murray,et al.  AVA: A large-scale database for aesthetic visual analysis , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[6]  Masashi Nishiyama,et al.  Aesthetic quality classification of photographs based on color harmony , 2011, CVPR 2011.

[7]  Jason Weston,et al.  A unified architecture for natural language processing: deep neural networks with multitask learning , 2008, ICML '08.

[8]  Trevor Darrell,et al.  Recognizing Image Style , 2013, BMVC.

[9]  Xiaogang Wang,et al.  Hybrid Deep Learning for Face Verification , 2013, 2013 IEEE International Conference on Computer Vision.

[10]  Xiaoou Tang,et al.  Photo and Video Quality Evaluation: Focusing on the Subject , 2008, ECCV.

[11]  Mubarak Shah,et al.  A framework for photo-quality assessment and enhancement based on visual aesthetics , 2010, ACM Multimedia.

[12]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[13]  Aaron Hertzmann,et al.  Color compatibility from large datasets , 2011, ACM Trans. Graph..

[14]  Yoshua Bengio,et al.  Gradient-based learning applied to document recognition , 1998, Proc. IEEE.

[15]  Qiang Yang,et al.  A Survey on Transfer Learning , 2010, IEEE Transactions on Knowledge and Data Engineering.

[16]  Honglak Lee,et al.  Adaptive Multi-Column Deep Neural Networks with Application to Robust Image Denoising , 2013, NIPS.

[17]  Jiebo Luo,et al.  Aesthetics and Emotions in Images , 2011, IEEE Signal Processing Magazine.

[18]  Raffay Hamid,et al.  What makes an image popular? , 2014, WWW.

[19]  Yee Whye Teh,et al.  A Fast Learning Algorithm for Deep Belief Nets , 2006, Neural Computation.

[20]  Yann LeCun,et al.  Pedestrian Detection with Unsupervised Multi-stage Feature Learning , 2012, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[21]  Shao-Yi Chien,et al.  Scenic photo quality assessment with bag of aesthetics-preserving features , 2011, ACM Multimedia.

[22]  Walter Niekamp,et al.  An exploratory investigation into factors affecting visual balance , 1981 .

[23]  G LoweDavid,et al.  Distinctive Image Features from Scale-Invariant Keypoints , 2004 .

[24]  Xiaogang Wang,et al.  Content-based photo quality assessment , 2011, 2011 International Conference on Computer Vision.

[25]  Vicente Ordonez,et al.  High level describable attributes for predicting aesthetics and interestingness , 2011, CVPR 2011.

[26]  Trevor Darrell,et al.  DeCAF: A Deep Convolutional Activation Feature for Generic Visual Recognition , 2013, ICML.

[27]  Jürgen Schmidhuber,et al.  Multi-column deep neural networks for image classification , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[28]  R. Arnheim Art and Visual Perception, a Psychology of the Creative Eye , 1967 .

[29]  Geoffrey E. Hinton Training Products of Experts by Minimizing Contrastive Divergence , 2002, Neural Computation.

[30]  James Ze Wang,et al.  On shape and the computability of emotions , 2012, ACM Multimedia.

[31]  James Ze Wang,et al.  Studying Aesthetics in Photographic Images Using a Computational Approach , 2006, ECCV.