Composition-Preserving Deep Photo Aesthetics Assessment

Photo aesthetics assessment is challenging. Deep convolutional neural network (ConvNet) methods have recently shown promising results for aesthetics assessment. The performance of these deep ConvNet methods, however, is often compromised by the constraint that the neural network only takes the fixed-size input. To accommodate this requirement, input images need to be transformed via cropping, scaling, or padding, which often damages image composition, reduces image resolution, or causes image distortion, thus compromising the aesthetics of the original images. In this paper, we present a composition-preserving deep Con-vNet method that directly learns aesthetics features from the original input images without any image transformations. Specifically, our method adds an adaptive spatial pooling layer upon the regular convolution and pooling layers to directly handle input images with original sizes and aspect ratios. To allow for multi-scale feature extraction, we develop the Multi-Net Adaptive Spatial Pooling ConvNet architecture which consists of multiple sub-networks with different adaptive spatial pooling sizes and leverage a scene-based aggregation layer to effectively combine the predictions from multiple sub-networks. Our experiments on the large-scale aesthetics assessment benchmark (AVA [29]) demonstrate that our method can significantly improve the state-of-the-art results in photo aesthetics assessment.

[1]  Xin Li,et al.  Blind image quality assessment , 2002, Proceedings. International Conference on Image Processing.

[2]  Wilson S. Geisler,et al.  Image quality assessment based on a degradation model , 2000, IEEE Trans. Image Process..

[3]  Florent Perronnin,et al.  Fisher Kernels on Visual Vocabularies for Image Categorization , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[4]  Mubarak Shah,et al.  A framework for photo-quality assessment and enhancement based on visual aesthetics , 2010, ACM Multimedia.

[5]  Xiaoou Tang,et al.  Photo and Video Quality Evaluation: Focusing on the Subject , 2008, ECCV.

[6]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[7]  James Zijun Wang,et al.  RAPID: Rating Pictorial Aesthetics using Deep Learning , 2014, ACM Multimedia.

[8]  Eero P. Simoncelli,et al.  Image quality assessment: from error visibility to structural similarity , 2004, IEEE Transactions on Image Processing.

[9]  Zhi-Hua Zhou,et al.  Ensemble Methods: Foundations and Algorithms , 2012 .

[10]  Masashi Nishiyama,et al.  Aesthetic quality classification of photographs based on color harmony , 2011, CVPR 2011.

[11]  Zhan Ma,et al.  Perceptual Quality Assessment of Video Considering Both Frame Rate and Quantization Artifacts , 2011, IEEE Transactions on Circuits and Systems for Video Technology.

[12]  Fei-Fei Li,et al.  Deep visual-semantic alignments for generating image descriptions , 2015, CVPR.

[13]  James Zijun Wang,et al.  Rating Image Aesthetics Using Deep Learning , 2015, IEEE Transactions on Multimedia.

[14]  Ming Yang,et al.  DeepFace: Closing the Gap to Human-Level Performance in Face Verification , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[15]  Thomas G. Dietterich Multiple Classifier Systems , 2000, Lecture Notes in Computer Science.

[16]  Vicente Ordonez,et al.  High level describable attributes for predicting aesthetics and interestingness , 2011, CVPR 2011.

[17]  Fei-Fei Li,et al.  Large-Scale Video Classification with Convolutional Neural Networks , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[18]  Jiebo Luo,et al.  Aesthetics and Emotions in Images , 2011, IEEE Signal Processing Magazine.

[19]  Jian Sun,et al.  Spatial Pyramid Pooling in Deep Convolutional Networks for Visual Recognition , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[20]  Wei-Ying Ma,et al.  Learning No-Reference Quality Metric by Examples , 2005, 11th International Multimedia Modelling Conference.

[21]  Alan C. Bovik,et al.  No-reference quality assessment using natural scene statistics: JPEG2000 , 2005, IEEE Transactions on Image Processing.

[22]  Naila Murray,et al.  AVA: A large-scale database for aesthetic visual analysis , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[23]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[24]  Thomas Mensink,et al.  Improving the Fisher Kernel for Large-Scale Image Classification , 2010, ECCV.

[25]  Alexander C. Loui,et al.  Automatic aesthetic value assessment in photographic images , 2010, 2010 IEEE International Conference on Multimedia and Expo.

[26]  Yan Ke,et al.  The Design of High-Level Features for Photo Quality Assessment , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[27]  Trevor Darrell,et al.  Recognizing Image Style , 2013, BMVC.

[28]  Patrick Le Callet,et al.  Considering Temporal Variations of Spatial Visual Distortions in Video Quality Assessment , 2009, IEEE Journal of Selected Topics in Signal Processing.

[29]  Jianbo Shi,et al.  DeepEdge: A multi-scale bifurcated deep network for top-down contour detection , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[30]  Jürgen Schmidhuber,et al.  Deep learning in neural networks: An overview , 2014, Neural Networks.

[31]  Xiaogang Wang,et al.  Deeply learned face representations are sparse, selective, and robust , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[32]  Camille Couprie,et al.  Learning Hierarchical Features for Scene Labeling , 2013, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[33]  Xiaogang Wang,et al.  DeepID-Net: Deformable deep convolutional neural networks for object detection , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[34]  Ashish Kapoor,et al.  Blind Image Quality Assessment Using Semi-supervised Rectifier Networks , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[35]  Jun Gao,et al.  Learning to predict the perceived visual quality of photos , 2011, 2011 International Conference on Computer Vision.

[36]  Jianxiong Xiao,et al.  3D ShapeNets: A deep representation for volumetric shapes , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[37]  Yuzhen Niu,et al.  What Makes a Professional Video? A Computational Aesthetics Approach , 2012, IEEE Transactions on Circuits and Systems for Video Technology.

[38]  Wei Luo,et al.  Content-Based Photo Quality Assessment , 2013, IEEE Trans. Multim..

[39]  Dumitru Erhan,et al.  Going deeper with convolutions , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[40]  Trevor Darrell,et al.  Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation , 2013, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[41]  Stephen Lin,et al.  Data-driven depth map refinement via multi-scale sparse representation , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[42]  Alan C. Bovik,et al.  Motion Tuned Spatio-Temporal Quality Assessment of Natural Videos , 2010, IEEE Transactions on Image Processing.

[43]  Frederic T. b. Blanchard,et al.  The art of composition , 1934 .

[44]  Shao-Yi Chien,et al.  Scenic photo quality assessment with bag of aesthetics-preserving features , 2011, ACM Multimedia.

[45]  W. Chu Studying Aesthetics in Photographic Images Using a Computational Approach , 2013 .

[46]  Bolei Zhou,et al.  Learning Deep Features for Scene Recognition using Places Database , 2014, NIPS.

[47]  Yizhou Yu,et al.  Visual saliency based on multiscale deep features , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[48]  Gabriela Csurka,et al.  Assessing the aesthetic quality of photographs using generic image descriptors , 2011, 2011 International Conference on Computer Vision.

[49]  Radomír Mech,et al.  Deep Multi-patch Aggregation Network for Image Style, Aesthetics, and Quality Estimation , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[50]  Yoshua Bengio,et al.  How transferable are features in deep neural networks? , 2014, NIPS.

[51]  Xiang Zhang,et al.  OverFeat: Integrated Recognition, Localization and Detection using Convolutional Networks , 2013, ICLR.

[52]  Tiago Rosa Maria Paula Queluz,et al.  No-Reference Quality Assessment of H.264/AVC Encoded Video , 2010, IEEE Transactions on Circuits and Systems for Video Technology.

[53]  Xiaogang Wang,et al.  Deep Learning Face Representation from Predicting 10,000 Classes , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[54]  R. SheikhH.,et al.  No-reference quality assessment using natural scene statistics , 2005 .

[55]  Fei-Fei Li,et al.  ImageNet: A large-scale hierarchical image database , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.