Attention-based Multi-Patch Aggregation for Image Aesthetic Assessment

Aggregation structures with explicit information, such as image attributes and scene semantics, are effective and popular for intelligent systems for assessing aesthetics of visual data. However, useful information may not be available due to the high cost of manual annotation and expert design. In this paper, we present a novel multi-patch (MP) aggregation method for image aesthetic assessment. Different from state-of-the-art methods, which augment an MP aggregation network with various visual attributes, we train the model in an end-to-end manner with aesthetic labels only (i.e., aesthetically positive or negative). We achieve the goal by resorting to an attention-based mechanism that adaptively adjusts the weight of each patch during the training process to improve learning efficiency. In addition, we propose a set of objectives with three typical attention mechanisms (i.e., average, minimum, and adaptive) and evaluate their effectiveness on the Aesthetic Visual Analysis (AVA) benchmark. Numerical results show that our approach outperforms existing methods by a large margin. We further verify the effectiveness of the proposed attention-based objectives via ablation studies and shed light on the design of aesthetic assessment systems.

[1]  Xiaogang Wang,et al.  Residual Attention Network for Image Classification , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[2]  Kaiqi Huang,et al.  Hierarchical aesthetic quality assessment using deep convolutional neural networks , 2016, Signal Process. Image Commun..

[3]  Feiyue Huang,et al.  Measuring and Predicting Visual Importance of Similar Objects , 2016, IEEE Transactions on Visualization and Computer Graphics.

[4]  Mubarak Shah,et al.  A holistic approach to aesthetic enhancement of photographs , 2011, TOMCCAP.

[5]  Ross B. Girshick,et al.  Focal Loss for Dense Object Detection , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[6]  Wenguan Wang,et al.  Deep Cropping via Attention Box Prediction and Aesthetics Assessment , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[7]  Michael S. Bernstein,et al.  ImageNet Large Scale Visual Recognition Challenge , 2014, International Journal of Computer Vision.

[8]  C. Koch,et al.  Computational modelling of visual attention , 2001, Nature Reviews Neuroscience.

[9]  Shuang Ma,et al.  A-Lamp: Adaptive Layout-Aware Multi-patch Deep Convolutional Neural Network for Photo Aesthetic Assessment , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[10]  Jürgen Schmidhuber,et al.  Deep Networks with Internal Selective Attention through Feedback Connections , 2014, NIPS.

[11]  Chu-Song Chen,et al.  Aesthetic Critiques Generation for Photos , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[12]  Naila Murray,et al.  AVA: A large-scale database for aesthetic visual analysis , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[13]  Jascha Sohl-Dickstein,et al.  SVCCA: Singular Vector Canonical Correlation Analysis for Deep Learning Dynamics and Interpretability , 2017, NIPS.

[14]  Jian Sun,et al.  Spatial Pyramid Pooling in Deep Convolutional Networks for Visual Recognition , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[15]  Wei Xu,et al.  Look and Think Twice: Capturing Top-Down Visual Attention with Feedback Convolutional Neural Networks , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[16]  Jorge Nocedal,et al.  On Large-Batch Training for Deep Learning: Generalization Gap and Sharp Minima , 2016, ICLR.

[17]  Clark V. Poling Johannes Itten, Design and Form: The Basic Course at the Bauhaus and Later , 1977 .

[18]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[19]  Radomír Mech,et al.  Photo Aesthetics Ranking Network with Attributes and Content Adaptation , 2016, ECCV.

[20]  Radomír Mech,et al.  Deep Multi-patch Aggregation Network for Image Style, Aesthetics, and Quality Estimation , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[21]  Luming Zhang,et al.  Describing Human Aesthetic Perception by Deeply-learned Attributes from Flickr , 2016, ArXiv.

[22]  Kwan-Liu Ma,et al.  Learning to Compose with Professional Photographs on the Web , 2017, ACM Multimedia.

[23]  Michael Freeman,et al.  The Complete Guide to Light & Lighting in Digital Photography (A Lark Photography Book) , 2006 .

[24]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[25]  Peyman Milanfar,et al.  NIMA: Neural Image Assessment , 2017, IEEE Transactions on Image Processing.

[26]  Abhinav Gupta,et al.  Training Region-Based Object Detectors with Online Hard Example Mining , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[27]  Xiaoou Tang,et al.  Image Aesthetic Assessment: An experimental survey , 2016, IEEE Signal Processing Magazine.

[28]  James Ze Wang,et al.  Studying Aesthetics in Photographic Images Using a Computational Approach , 2006, ECCV.

[29]  Hailin Jin,et al.  Composition-Preserving Deep Photo Aesthetics Assessment , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[30]  Thomas S. Huang,et al.  Brain-Inspired Deep Networks for Image Aesthetics Assessment , 2016, ArXiv.

[31]  Frank Hutter,et al.  Online Batch Selection for Faster Training of Neural Networks , 2015, ArXiv.

[32]  Kaiming He,et al.  Focal Loss for Dense Object Detection , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[33]  Jingrui He,et al.  Classification of Digital Photos Taken by Photographers or Home Users , 2004, PCM.

[34]  James Zijun Wang,et al.  Rating Image Aesthetics Using Deep Learning , 2015, IEEE Transactions on Multimedia.

[35]  Yan Ke,et al.  The Design of High-Level Features for Photo Quality Assessment , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[36]  Ran He,et al.  Deep Aesthetic Quality Assessment With Semantic Information , 2016, IEEE Transactions on Image Processing.