Self-Supervised Multi-Task Pretraining Improves Image Aesthetic Assessment

Neural networks for Image Aesthetic Assessment are usually initialized with weights of pretrained ImageNet models and then trained using a labeled image aesthetics dataset. We argue that the ImageNet classification task is not well-suited for pretraining, since content based classification is designed to make the model invariant to features that strongly influence the image’s aesthetics, e.g. stylebased features such as brightness or contrast.We propose to use self-supervised aesthetic-aware pretext tasks that let the network learn aesthetically relevant features, based on the observation that distorting aesthetic images with image filters usually reduces their appeal. To ensure that images are not accidentally improved when filters are applied, we introduce a large dataset comprised of highly aesthetic images as the starting point for the distortions. The network is then trained to rank less distorted images higher than their more distorted counterparts. To exploit effects of multiple different objectives, we also embed this task into a multi-task setting by adding either a self-supervised classification or regression task. In our experiments, we show that our pretraining improves performance over the ImageNet initialization and reduces the number of epochs until convergence by up to 47%. Additionally, we can match the performance of an ImageNet-initialized model while reducing the labeled training data by 20%. We make our code, data, and pretrained models available.

[1]  Zhengfang Duanmu,et al.  End-to-End Blind Image Quality Assessment Using Deep Neural Networks , 2018, IEEE Transactions on Image Processing.

[2]  Nikolay N. Ponomarenko,et al.  Color image database TID2013: Peculiarities and preliminary results , 2013, European Workshop on Visual Information Processing (EUVIP).

[3]  Rongrong Ji,et al.  Revisiting Image Aesthetic Assessment via Self-Supervised Feature Learning , 2019, AAAI.

[4]  Sebastian Bosse,et al.  Deep Neural Networks for No-Reference and Full-Reference Image Quality Assessment , 2016, IEEE Transactions on Image Processing.

[5]  Chang-Su Kim,et al.  PAC-Net: Pairwise Aesthetic Comparison Network for Image Aesthetic Assessment , 2018, 2018 25th IEEE International Conference on Image Processing (ICIP).

[6]  Nikos Komodakis,et al.  Unsupervised Representation Learning by Predicting Image Rotations , 2018, ICLR.

[7]  Yu Zhang,et al.  A Simple General Approach to Balance Task Difficulty in Multi-Task Learning , 2020, ArXiv.

[8]  Kilian Q. Weinberger,et al.  Densely Connected Convolutional Networks , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[9]  Ross B. Girshick,et al.  Focal Loss for Dense Object Detection , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[10]  Joost van de Weijer,et al.  RankIQA: Learning from Rankings for No-Reference Image Quality Assessment , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[11]  Shuang Ma,et al.  A-Lamp: Adaptive Layout-Aware Multi-patch Deep Convolutional Neural Network for Photo Aesthetic Assessment , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[12]  Jimmy Omony,et al.  Constrained Stochastic Space Search Method for Parameter Estimation in Biological Networks , 2014 .

[13]  Geoffrey E. Hinton,et al.  A Simple Framework for Contrastive Learning of Visual Representations , 2020, ICML.

[14]  Dimitris Samaras,et al.  Squared Earth Mover's Distance-based Loss for Training Deep Neural Networks , 2016, ArXiv.

[15]  F. Wilcoxon Individual Comparisons by Ranking Methods , 1945 .

[16]  Damon M. Chandler,et al.  Opinion-Unaware Blind Quality Assessment of Multiply and Singly Distorted Images via Distortion Parameter Estimation , 2018, IEEE Transactions on Image Processing.

[17]  Thomas G. Dietterich,et al.  Benchmarking Neural Network Robustness to Common Corruptions and Perturbations , 2018, ICLR.

[18]  Andreas Hotho,et al.  NICER: Aesthetic Image Enhancement with Humans in the Loop , 2020, ArXiv.

[19]  King Ngi Ngan,et al.  No reference image quality metric via distortion identification and multi-channel label transfer , 2014, 2014 IEEE International Symposium on Circuits and Systems (ISCAS).

[20]  Mark Sandler,et al.  MobileNetV2: Inverted Residuals and Linear Bottlenecks , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[21]  A. Bovik,et al.  Image Quality Assessment , 2012 .

[22]  Kaiming He,et al.  Focal Loss for Dense Object Detection , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[23]  Abhinav Gupta,et al.  Scaling and Benchmarking Self-Supervised Visual Representation Learning , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[24]  Kris Kitani,et al.  No-Reference Image Quality Assessment via Feature Fusion and Multi-Task Learning , 2020, ArXiv.

[25]  Yingli Tian,et al.  Self-Supervised Visual Feature Learning With Deep Neural Networks: A Survey , 2019, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[26]  Naila Murray,et al.  AVA: A large-scale database for aesthetic visual analysis , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[27]  Feiyue Huang,et al.  Attention-based Multi-Patch Aggregation for Image Aesthetic Assessment , 2018, ACM Multimedia.

[28]  M. Napierala What Is the Bonferroni Correction ? , 2014 .

[29]  Li Fei-Fei,et al.  ImageNet: A large-scale hierarchical image database , 2009, CVPR.

[30]  Yi Li,et al.  Convolutional Neural Networks for No-Reference Image Quality Assessment , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[31]  Peyman Milanfar,et al.  NIMA: Neural Image Assessment , 2017, IEEE Transactions on Image Processing.

[32]  Alexander Kolesnikov,et al.  Revisiting Self-Supervised Visual Representation Learning , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[33]  Yi Li,et al.  Simultaneous estimation of image quality and distortion via multi-task convolutional neural networks , 2015, 2015 IEEE International Conference on Image Processing (ICIP).

[34]  Dietmar Saupe,et al.  Effective Aesthetics Prediction With Multi-Level Spatially Pooled Features , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).