Multi-label Recognition of Paintings with Cascaded Attention Network

Convolutional neural networks (CNNs) have demonstrated advanced performance on image multi-label classification. However, recognizing labels of paintings is still a challenging problem due to the huge collection and labeling cost on painting training set. Inspired by the similarity between natural image and painting image, we propose an approach based on progressive learning to solve this issue by use of a few labeled paintings. In addition, we set up an effective framework built upon visual cascaded attention for multi-label image classification. Different from the existing approaches, the proposed model extracts and integrates multi-scale features to learn discriminative feature representations, which are then fed to the class-wise attention module with a simple scheme. Experimental results on the challenging benchmark MS-COCO dataset show that our proposed model achieves the best performance compared to the state-of-the-art models. We also demonstrate the effectiveness of the model on our constructed painting testing datasets (Datasets will be made publicly available soon.).

[1]  Alexei A. Efros,et al.  Unpaired Image-to-Image Translation Using Cycle-Consistent Adversarial Networks , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[2]  Wei Xu,et al.  CNN-RNN: A Unified Framework for Multi-label Image Classification , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[3]  Hideki Nakayama,et al.  Annotation order matters: Recurrent Image Annotator for arbitrary length image tagging , 2016, 2016 23rd International Conference on Pattern Recognition (ICPR).

[4]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[5]  Yale Song,et al.  Improving Pairwise Ranking for Multi-label Image Classification , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[6]  Ioannis A. Kakadiaris,et al.  Deep Imbalanced Attribute Classification using Visual Attention Aggregation , 2018, ECCV.

[7]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[8]  Xiaogang Wang,et al.  Multi-context Attention for Human Pose Estimation , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[9]  Nenghai Yu,et al.  Learning Spatial Regularization with Image-Level Supervisions for Multi-label Image Classification , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[10]  Andrew Zisserman,et al.  Spatial Transformer Networks , 2015, NIPS.

[11]  Li Fei-Fei,et al.  ImageNet: A large-scale hierarchical image database , 2009, CVPR.

[12]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[13]  Qi Wu,et al.  Multilabel Image Classification With Regional Latent Semantic Dependencies , 2016, IEEE Transactions on Multimedia.

[14]  Pietro Perona,et al.  Microsoft COCO: Common Objects in Context , 2014, ECCV.