Deep Learning for Multilabel Remote Sensing Image Annotation With Dual-Level Semantic Concepts

Multilabel remote sensing (RS) image annotation is a challenging and time-consuming task that requires a considerable amount of expert knowledge. Most existing RS image annotation methods are based on handcrafted features and require multistage processes that are not sufficiently efficient and effective. An RS image can be assigned with a single label at the scene level to depict the overall understanding of the scene and with multiple labels at the object level to represent the major components. The multiple labels can be used as supervised information for annotation, whereas the single label can be used as additional information to exploit the scene-level similarity relationships. By exploiting the dual-level semantic concepts, we propose an end-to-end deep learning framework for object-level multilabel annotation of RS images. The proposed framework consists of a shared convolutional neural network for discriminative feature learning, a classification branch for multilabel annotation and an embedding branch for preserving the scene-level similarity relationships. In the classification branch, an attention mechanism is introduced to generate attention-aware features, and skip-layer connections are incorporated to combine information from multiple layers. The philosophy of the embedding branch is that images with the same scene-level semantic concepts should have similar visual representations. The proposed method adopts the binary cross-entropy loss for classification and the triplet loss for image embedding learning. The evaluations on three multilabel RS image data sets demonstrate the effectiveness and superiority of the proposed method in comparison with the state-of-the-art methods.

[1]  Kilian Q. Weinberger,et al.  Fast Image Tagging , 2013, ICML.

[2]  Yinghua Ye,et al.  A Deep Learning Approach on Building Detection from Unmanned Aerial Vehicle-Based Images in Riverbank Monitoring , 2018, Sensors.

[3]  Hong Sun,et al.  Tile-Level Annotation of Satellite Images Using Multi-Level Max-Margin Discriminative Random Field , 2013, Remote. Sens..

[4]  Sergey Ioffe,et al.  Rethinking the Inception Architecture for Computer Vision , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[5]  Xiao Xiang Zhu,et al.  Deep Recurrent Neural Networks for Hyperspectral Image Classification , 2017, IEEE Transactions on Geoscience and Remote Sensing.

[6]  Lorenzo Bruzzone,et al.  Multilabel Remote Sensing Image Retrieval Using a Semisupervised Graph-Theoretic Method , 2018, IEEE Transactions on Geoscience and Remote Sensing.

[7]  Lorenzo Bruzzone,et al.  Content based hyperspectral image retrieval using bag of endmembers image descriptors , 2016, 2016 8th Workshop on Hyperspectral Image and Signal Processing: Evolution in Remote Sensing (WHISPERS).

[8]  Gui-Song Xia,et al.  AID: A Benchmark Data Set for Performance Evaluation of Aerial Scene Classification , 2016, IEEE Transactions on Geoscience and Remote Sensing.

[9]  Shawn D. Newsam,et al.  Bag-of-visual-words and spatial extensions for land-use classification , 2010, GIS '10.

[10]  Mihai Datcu,et al.  Land Cover Semantic Annotation Derived from High-Resolution SAR Images , 2016, IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing.

[11]  Nanning Zheng,et al.  Person Re-identification by Multi-Channel Parts-Based CNN with Improved Triplet Loss Function , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[12]  Enhua Wu,et al.  Squeeze-and-Excitation Networks , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[13]  Trevor Darrell,et al.  Fully Convolutional Networks for Semantic Segmentation , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[14]  Panagiotis Tsakalides,et al.  Land Classification Using Remotely Sensed Data: Going Multilabel , 2016, IEEE Transactions on Geoscience and Remote Sensing.

[15]  Feiran Huang,et al.  Learning Social Image Embedding with Deep Multimodal Attention Networks , 2017, ACM Multimedia.

[16]  Zhi-Hua Zhou,et al.  ML-KNN: A lazy learning approach to multi-label learning , 2007, Pattern Recognit..

[17]  Abhinav Gupta,et al.  Non-local Neural Networks , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[18]  Grigorios Tsoumakas,et al.  Mining Multi-label Data , 2010, Data Mining and Knowledge Discovery Handbook.

[19]  Xiao Xiang Zhu,et al.  Recurrently exploring class-wise attention in a hybrid convolutional and bidirectional LSTM network for multi-label aerial image classification , 2018, ISPRS journal of photogrammetry and remote sensing : official publication of the International Society for Photogrammetry and Remote Sensing.

[20]  C. V. Jawahar,et al.  Multi-label Cross-Modal Retrieval , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[21]  James Philbin,et al.  FaceNet: A unified embedding for face recognition and clustering , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[22]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[23]  Qingshan Liu,et al.  Cascaded Recurrent Neural Networks for Hyperspectral Image Classification , 2017, IEEE Transactions on Geoscience and Remote Sensing.

[24]  Pierre Alliez,et al.  High-Resolution Aerial Image Labeling With Convolutional Neural Networks , 2016, IEEE Transactions on Geoscience and Remote Sensing.

[25]  Thomas G. Dietterich What is machine learning? , 2020, Archives of Disease in Childhood.

[26]  Gang Wang,et al.  Progressive Attention Guided Recurrent Network for Salient Object Detection , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[27]  Qingjie Liu,et al.  Road Extraction by Deep Residual U-Net , 2017, IEEE Geoscience and Remote Sensing Letters.

[28]  Yongjun Zhang,et al.  Large-Scale Remote Sensing Image Retrieval by Deep Hashing Neural Networks , 2018, IEEE Transactions on Geoscience and Remote Sensing.

[29]  Li Fei-Fei,et al.  ImageNet: A large-scale hierarchical image database , 2009, CVPR.

[30]  Kilian Q. Weinberger,et al.  Distance Metric Learning for Large Margin Nearest Neighbor Classification , 2005, NIPS.

[31]  Zhiwu Lu,et al.  Multi-Modal Multi-Scale Deep Learning for Large-Scale Image Annotation , 2017, IEEE Transactions on Image Processing.

[32]  Xia Chen,et al.  Multi-Label Classification Based on Low Rank Representation for Image Annotation , 2017, Remote. Sens..

[33]  Daniel Gardner Stanford,et al.  Multi-label Classification of Satellite Images with Deep Learning , 2017 .

[34]  Wei Xu,et al.  CNN-RNN: A Unified Framework for Multi-label Image Classification , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[35]  Zhuowen Tu,et al.  Aggregated Residual Transformations for Deep Neural Networks , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[36]  Weiwei Liu,et al.  Projection learning with local and global consistency constraints for scene classification , 2018 .

[37]  Greg Mori,et al.  Learning Structured Inference Neural Networks with Label Relations , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[38]  Lukasz Kaiser,et al.  Attention is All you Need , 2017, NIPS.

[39]  Paolo Napoletano,et al.  Benchmark Analysis of Representative Deep Neural Network Architectures , 2018, IEEE Access.

[40]  Shihong Du,et al.  Hierarchical semantic cognition for urban functional zones with VHR satellite images and POI data , 2017 .

[41]  Shiyong Cui,et al.  Building Footprint Extraction From VHR Remote Sensing Images Combined With Normalized DSMs Using Fused Fully Convolutional Networks , 2018, IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing.

[42]  Yoshua Bengio,et al.  Understanding the difficulty of training deep feedforward neural networks , 2010, AISTATS.

[43]  Min-Ling Zhang,et al.  A Review on Multi-Label Learning Algorithms , 2014, IEEE Transactions on Knowledge and Data Engineering.

[44]  Thomas Brox,et al.  U-Net: Convolutional Networks for Biomedical Image Segmentation , 2015, MICCAI.

[45]  Saso Dzeroski,et al.  Decision Trees for Hierarchical Multilabel Classification: A Case Study in Functional Genomics , 2006, PKDD.

[46]  Hao Liu,et al.  A Three-Layered Graph-Based Learning Approach for Remote Sensing Image Retrieval , 2016, IEEE Transactions on Geoscience and Remote Sensing.

[47]  Panagiotis Tsakalides,et al.  Deep Learning for Multilabel Land Cover Scene Categorization Using Data Augmentation , 2019, IEEE Geoscience and Remote Sensing Letters.

[48]  Zhongfei Zhang,et al.  Multi-label Triplet Embeddings for Image Annotation from User-Generated Tags , 2018, ICMR.

[49]  Sergey Ioffe,et al.  Inception-v4, Inception-ResNet and the Impact of Residual Connections on Learning , 2016, AAAI.

[50]  Xiao Xiang Zhu,et al.  Relation Network for Multilabel Aerial Image Classification , 2019, IEEE Transactions on Geoscience and Remote Sensing.

[51]  Xiaogang Wang,et al.  Residual Attention Network for Image Classification , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[52]  Shutao Li,et al.  Remote Sensing Scene Classification Using Multilayer Stacked Covariance Pooling , 2018, IEEE Transactions on Geoscience and Remote Sensing.

[53]  Christopher M. Bishop,et al.  Pattern Recognition and Machine Learning (Information Science and Statistics) , 2006 .