Bi-Modal Learning With Channel-Wise Attention for Multi-Label Image Classification

Multi-label image classification is more in line with the real-world applications. This problem is difficult due to the the fact that complex label space makes it hard to get label-level attention regions and deal with semantic relationships among labels. Common deep network-based methods utilize CNN to extract features and consider the labels as a sequence or a graph, thus handling the label correlations with RNN or graph-theoretical algorithms. In this paper, we propose a novel CNN-RNN-based model, bi-modal multi-label learning(BMML) framework. Firstly, an improved channel-wise attention mechanism is presented to propose regional attention maps and connect them to relative labels. After that, based on the assumption that objects in a semantic scene always have high-level relevance among visual and textual corpus, we further embed the labels through different pre-trained language models and determine the label sequence in a “semantic space” constructed on large-scale textual data, thereby handling the labels in their semantic context. In addition, a cross-modal feature aligning module is introduced in BMML framework. Experimental results show that BMML is able to achieve better accuracies then those mainstream multi-label classification methods on several benchmark data sets.

[1]  Zhi-Hua Zhou,et al.  ML-KNN: A lazy learning approach to multi-label learning , 2007, Pattern Recognit..

[2]  David G. Lowe,et al.  Object recognition from local scale-invariant features , 1999, Proceedings of the Seventh IEEE International Conference on Computer Vision.

[3]  Yu-Chiang Frank Wang,et al.  Order-Free RNN with Visual Attention for Multi-Label Classification , 2017, AAAI.

[4]  George A. Miller,et al.  WordNet: A Lexical Database for English , 1995, HLT.

[5]  Greg Mori,et al.  Deep ConvNet for Multi-label Classification with Partial Labels-Supplementary , 2019 .

[6]  Nenghai Yu,et al.  Learning Spatial Regularization with Image-Level Supervisions for Multi-label Image Classification , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[7]  Yangqing Jia,et al.  Deep Convolutional Ranking for Multilabel Image Annotation , 2013, ICLR.

[8]  Nello Cristianini,et al.  Kernel Methods for Pattern Analysis , 2004 .

[9]  Lorenzo Bruzzone,et al.  Multilabel Remote Sensing Image Retrieval Using a Semisupervised Graph-Theoretic Method , 2018, IEEE Transactions on Geoscience and Remote Sensing.

[10]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[11]  Yu-Chiang Frank Wang,et al.  Multi-label Zero-Shot Learning with Structured Knowledge Graphs , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[12]  Yu-Chiang Frank Wang,et al.  Learning Deep Latent Spaces for Multi-Label Classification , 2017, ArXiv.

[13]  Eyke Hüllermeier,et al.  Label ranking by learning pairwise preferences , 2008, Artif. Intell..

[14]  Jing-Yu Yang,et al.  Unsupervised discriminant canonical correlation analysis based on spectral clustering , 2016, Neurocomputing.

[15]  Pietro Perona,et al.  Microsoft COCO: Common Objects in Context , 2014, ECCV.

[16]  Marc Acheroy,et al.  Texture classification using Gabor filters , 2002, Pattern Recognit. Lett..

[17]  Qi Wu,et al.  Multilabel Image Classification With Regional Latent Semantic Dependencies , 2016, IEEE Transactions on Multimedia.

[18]  Bernt Schiele,et al.  Latent Embeddings for Zero-Shot Classification , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[19]  Grigorios Tsoumakas,et al.  Random k -Labelsets: An Ensemble Method for Multilabel Classification , 2007, ECML.

[20]  Thomas S. Huang,et al.  Image Super-Resolution Via Sparse Representation , 2010, IEEE Transactions on Image Processing.

[21]  Ethem Alpaydin,et al.  Localized multiple kernel learning , 2008, ICML '08.

[22]  Geoff Holmes,et al.  Classifier chains for multi-label classification , 2009, Machine Learning.

[23]  Min-Ling Zhang,et al.  A Review on Multi-Label Learning Algorithms , 2014, IEEE Transactions on Knowledge and Data Engineering.

[24]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[25]  Jeffrey Pennington,et al.  GloVe: Global Vectors for Word Representation , 2014, EMNLP.

[26]  Liang Lin,et al.  Multi-label Image Recognition by Recurrently Discovering Attentional Regions , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[27]  Jeffrey Dean,et al.  Efficient Estimation of Word Representations in Vector Space , 2013, ICLR.

[28]  Luke S. Zettlemoyer,et al.  Deep Contextualized Word Representations , 2018, NAACL.

[29]  Jianfei Cai,et al.  MIML-FCN+: Multi-Instance Multi-Label Learning via Fully Convolutional Networks with Privileged Information , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[30]  Xiaoyong Du,et al.  Zero-shot Image Tagging by Hierarchical Semantic Embedding , 2015, SIGIR.

[31]  Tao Mei,et al.  Learning Multi-attention Convolutional Neural Network for Fine-Grained Image Recognition , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[32]  Ming-Wei Chang,et al.  BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.

[33]  Jason Weston,et al.  A kernel method for multi-labelled classification , 2001, NIPS.

[34]  Michael K. Ng,et al.  Transductive Multilabel Learning via Label Set Propagation , 2013, IEEE Transactions on Knowledge and Data Engineering.

[35]  Bill Triggs,et al.  Histograms of oriented gradients for human detection , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[36]  Tat-Seng Chua,et al.  NUS-WIDE: a real-world web image database from National University of Singapore , 2009, CIVR '09.

[37]  Wei Xu,et al.  CNN-RNN: A Unified Framework for Multi-label Image Classification , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[38]  Liang Lin,et al.  Recurrent Attentional Reinforcement Learning for Multi-label Image Recognition , 2017, AAAI.

[39]  Yale Song,et al.  Improving Pairwise Ranking for Multi-label Image Classification , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[40]  Luc Van Gool,et al.  The Pascal Visual Object Classes Challenge: A Retrospective , 2014, International Journal of Computer Vision.

[41]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[42]  Dit-Yan Yeung,et al.  Convolutional LSTM Network: A Machine Learning Approach for Precipitation Nowcasting , 2015, NIPS.

[43]  Avrim Blum,et al.  The Bottleneck , 2021, Monopsony Capitalism.

[44]  Xuelong Li,et al.  A CNN-RNN architecture for multi-label weather recognition , 2018, Neurocomputing.

[45]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[46]  Hongming Zhou,et al.  Extreme Learning Machine for Regression and Multiclass Classification , 2012, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).

[47]  Cordelia Schmid,et al.  Beyond Bags of Features: Spatial Pyramid Matching for Recognizing Natural Scene Categories , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[48]  Jianping Fan,et al.  Multi-label learning method based on ML-RBF and laplacian ELM , 2019, Neurocomputing.

[49]  Mingxuan Sun,et al.  A Deep Multi-Modal CNN for Multi-Instance Multi-Label Image Classification , 2018, IEEE Transactions on Image Processing.

[50]  Cordelia Schmid,et al.  Label-Embedding for Attribute-Based Classification , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.