MultiScene: A Large-Scale Dataset and Benchmark for Multiscene Recognition in Single Aerial Images

Aerial scene recognition is a fundamental research problem in interpreting high-resolution aerial imagery. Over the past few years, most studies focus on classifying an image into one scene category, while in real-world scenarios, it is more often that a single image contains multiple scenes. Therefore, in this article, we investigate a more practical yet underexplored task—multiscene recognition in single images. To this end, we create a large-scale dataset, called MultiScene, composed of 100 000 unconstrained high-resolution aerial images. Considering that manually labeling such images is extremely arduous, we resort to low-cost annotations from crowdsourcing platforms, e.g., OpenStreetMap (OSM). However, OSM data might suffer from incompleteness and incorrectness, which introduce noise into image labels. To address this issue, we visually inspect 14 000 images and correct their scene labels, yielding a subset of cleanly annotated images, named MultiScene-Clean. With it, we can develop and evaluate deep networks for multiscene recognition using clean data. Moreover, we provide crowdsourced annotations of all images for the purpose of studying network learning with noisy labels. We conduct experiments with extensive baseline models on both MultiScene-Clean and MultiScene to offer benchmarks for multiscene recognition in single images and learning from noisy labels for this task, respectively. To facilitate progress, we make our dataset and trained models available on https://gitlab.lrz.de/ai4eo/reasoning/multiscene.

[1]  Forrest N. Iandola,et al.  SqueezeNet: AlexNet-level accuracy with 50x fewer parameters and <1MB model size , 2016, ArXiv.

[2]  Hyun-Woo Lee,et al.  Deep neural networks for wild fire detection with unmanned aerial vehicle , 2017, 2017 IEEE International Conference on Consumer Electronics (ICCE).

[3]  Sergey Ioffe,et al.  Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift , 2015, ICML.

[4]  Farid Melgani,et al.  Multilabel Conditional Random Field Classification for UAV Images , 2018, IEEE Geoscience and Remote Sensing Letters.

[5]  Chen Ning,et al.  Multi-Label Remote Sensing Scene Classification Using Multi-Bag Integration , 2019, IEEE Access.

[6]  Xiao Xiang Zhu,et al.  Mapping the Land Cover of Africa at 10 m Resolution from Multi-Source Remote Sensing Data with Google Earth Engine , 2020, Remote. Sens..

[7]  Xiao Xiang Zhu,et al.  Relation Network for Multilabel Aerial Image Classification , 2019, IEEE Transactions on Geoscience and Remote Sensing.

[8]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[9]  Bertrand Le Saux,et al.  Beyond RGB: Very High Resolution Urban Remote Sensing With Multimodal Deep Networks , 2017, ISPRS Journal of Photogrammetry and Remote Sensing.

[10]  Wen Yang,et al.  STRUCTURAL HIGH-RESOLUTION SATELLITE IMAGE INDEXING , 2010 .

[11]  Sergey Ioffe,et al.  Inception-v4, Inception-ResNet and the Impact of Residual Connections on Learning , 2016, AAAI.

[12]  Lorenzo Bruzzone,et al.  Remote Sensing Image Scene Classification with Deep Neural Networks in JPEG 2000 Compressed Domain , 2020, ArXiv.

[13]  David G. Lowe,et al.  Distinctive Image Features from Scale-Invariant Keypoints , 2004, International Journal of Computer Vision.

[14]  Liangpei Zhang,et al.  Semantic Classification of Urban Trees Using Very High Resolution Satellite Imagery , 2017, IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing.

[15]  Xiaoqiang Lu,et al.  Remote Sensing Image Scene Classification: Benchmark and State of the Art , 2017, Proceedings of the IEEE.

[16]  Geoffrey E. Hinton,et al.  On the importance of initialization and momentum in deep learning , 2013, ICML.

[17]  Friedrich Fraundorfer,et al.  Regularization of Building Boundaries in Satellite Images Using Adversarial and Regularized Losses , 2019, IGARSS 2019 - 2019 IEEE International Geoscience and Remote Sensing Symposium.

[18]  Ming Yang,et al.  Large-scale image classification: Fast feature extraction and SVM training , 2011, CVPR 2011.

[19]  Bill Triggs,et al.  Histograms of oriented gradients for human detection , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[20]  Kilian Q. Weinberger,et al.  Densely Connected Convolutional Networks , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[21]  Xin Huang,et al.  Building Footprint Generation by Integrating Convolution Neural Network With Feature Pairwise Conditional Random Field (FPCRF) , 2020, IEEE Transactions on Geoscience and Remote Sensing.

[22]  Trevor Hastie,et al.  Boosting and Additive Trees , 2009 .

[23]  Dong Chen,et al.  Cross-Task Transfer for Multimodal Aerial Scene Recognition , 2020, ArXiv.

[24]  Lizhe Wang,et al.  High-Resolution Remote Sensing Image Scene Classification via Key Filter Bank Based on Convolutional Neural Network , 2020, IEEE Transactions on Geoscience and Remote Sensing.

[25]  Bo Du,et al.  Assessing the Threat of Adversarial Examples on Deep Neural Networks for Remote Sensing Scene Classification: Attacks and Defenses , 2021, IEEE Transactions on Geoscience and Remote Sensing.

[26]  Hao Sun,et al.  A Feature Aggregation Convolutional Neural Network for Remote Sensing Scene Classification , 2019, IEEE Transactions on Geoscience and Remote Sensing.

[27]  Forrest N. Iandola,et al.  SqueezeNet: AlexNet-level accuracy with 50x fewer parameters and <1MB model size , 2016, ArXiv.

[28]  Lorenzo Bruzzone,et al.  Multilabel Remote Sensing Image Retrieval Using a Semisupervised Graph-Theoretic Method , 2018, IEEE Transactions on Geoscience and Remote Sensing.

[29]  Farid Melgani,et al.  Spatial and Structured SVM for Multilabel Image Classification , 2018, IEEE Transactions on Geoscience and Remote Sensing.

[30]  Michele Volpi,et al.  Land cover mapping at very high resolution with rotation equivariant CNNs: towards small yet accurate models , 2018, ISPRS Journal of Photogrammetry and Remote Sensing.

[31]  Liangpei Zhang,et al.  Scene Classification Based on the Multifeature Fusion Probabilistic Topic Model for High Spatial Resolution Remote Sensing Imagery , 2015, IEEE Transactions on Geoscience and Remote Sensing.

[32]  Michael J. Swain,et al.  Color indexing , 1991, International Journal of Computer Vision.

[33]  Mihai Datcu,et al.  Semantic Annotation of Satellite Images Using Latent Dirichlet Allocation , 2010, IEEE Geoscience and Remote Sensing Letters.

[34]  Begüm Demir,et al.  Bigearthnet: A Large-Scale Benchmark Archive for Remote Sensing Image Understanding , 2019, IGARSS 2019 - 2019 IEEE International Geoscience and Remote Sensing Symposium.

[35]  Bo Chen,et al.  MnasNet: Platform-Aware Neural Architecture Search for Mobile , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[36]  A. Vetrivel,et al.  Disaster damage detection through synergistic use of deep learning and 3D point cloud features derived from very high resolution oblique aerial images, and multiple-kernel-learning , 2017, ISPRS Journal of Photogrammetry and Remote Sensing.

[37]  Xiangyu Zhang,et al.  ShuffleNet V2: Practical Guidelines for Efficient CNN Architecture Design , 2018, ECCV.

[38]  Uwe Stilla,et al.  Classification With an Edge: Improving Semantic Image Segmentation with Boundary Detection , 2016, ISPRS Journal of Photogrammetry and Remote Sensing.

[39]  G LoweDavid,et al.  Distinctive Image Features from Scale-Invariant Keypoints , 2004 .

[40]  Pietro Perona,et al.  Microsoft COCO: Common Objects in Context , 2014, ECCV.

[41]  Giorgos Mallinis,et al.  On the Use of Unmanned Aerial Systems for Environmental Monitoring , 2018, Remote. Sens..

[42]  Xiangtao Zheng,et al.  Remote Sensing Scene Classification by Gated Bidirectional Network , 2020, IEEE Transactions on Geoscience and Remote Sensing.

[43]  Xiao Xiang Zhu,et al.  Local climate zone-based urban land cover classification from multi-seasonal Sentinel-2 images with a recurrent residual network , 2019, ISPRS journal of photogrammetry and remote sensing : official publication of the International Society for Photogrammetry and Remote Sensing.

[44]  Matti Pietikäinen,et al.  Multiresolution Gray-Scale and Rotation Invariant Texture Classification with Local Binary Patterns , 2002, IEEE Trans. Pattern Anal. Mach. Intell..

[45]  Lichao Mou,et al.  Dual Adversarial Network for Unsupervised Ground/Satellite-to-Aerial Scene Adaptation , 2020, ACM Multimedia.

[46]  Bo Du,et al.  Multi-Temporal Scene Classification and Scene Change Detection With Correlation Based Fusion , 2020, IEEE Transactions on Image Processing.

[47]  Gencer Sumbul,et al.  A Deep Multi-Attention Driven Approach for Multi-Label Remote Sensing Image Classification , 2020, IEEE Access.

[48]  Xiangyu Zhang,et al.  ShuffleNet: An Extremely Efficient Convolutional Neural Network for Mobile Devices , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[49]  Ran Cao,et al.  Self-Attention-Based Deep Feature Fusion for Remote Sensing Scene Classification , 2021, IEEE Geoscience and Remote Sensing Letters.

[50]  Quoc V. Le,et al.  Neural Architecture Search with Reinforcement Learning , 2016, ICLR.

[51]  Gui-Song Xia,et al.  AID++: An Updated Version of AID on Scene Classification , 2018, IGARSS 2018 - 2018 IEEE International Geoscience and Remote Sensing Symposium.

[52]  Fei-Fei Li,et al.  ImageNet: A large-scale hierarchical image database , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[53]  Gui-Song Xia,et al.  Bag-of-Visual-Words Scene Classifier With Local and Global Features for High Spatial Resolution Remote Sensing Imagery , 2016, IEEE Geoscience and Remote Sensing Letters.

[54]  Gustau Camps-Valls,et al.  Multi-temporal and multi-source remote sensing image classification by nonlinear relative normalization , 2016, ArXiv.

[55]  Devis Tuia,et al.  Zoom In, Zoom Out: Injecting Scale Invariance into Landuse Classification CNNs , 2019, IGARSS 2019 - 2019 IEEE International Geoscience and Remote Sensing Symposium.

[56]  Xiao Xiang Zhu,et al.  DiRS: On Creating Benchmark Datasets for Remote Sensing Image Interpretation , 2020, ArXiv.

[57]  Zhuowen Tu,et al.  Aggregated Residual Transformations for Deep Neural Networks , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[58]  Ping Tang,et al.  Land-Use Scene Classification Using a Concentric Circle-Structured Multiscale Bag-of-Visual-Words Model , 2014, IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing.

[59]  X. X. Zhu,et al.  A framework for large-scale mapping of human settlement extent from Sentinel-2 images via fully convolutional neural networks , 2020, ISPRS journal of photogrammetry and remote sensing : official publication of the International Society for Photogrammetry and Remote Sensing.

[60]  Lorenzo Bruzzone,et al.  Multiple Kernel Learning for Remote Sensing Image Classification , 2018, IEEE Transactions on Geoscience and Remote Sensing.

[61]  Junwei Han,et al.  Automatic landslide detection from remote-sensing imagery using a scene classification method based on BoVW and pLSA , 2013 .

[62]  Yanfei Zhong,et al.  High-Resolution Remote Sensing Image Scene Understanding: A Review , 2019, IGARSS 2019 - 2019 IEEE International Geoscience and Remote Sensing Symposium.

[63]  Shawn D. Newsam,et al.  Bag-of-visual-words and spatial extensions for land-use classification , 2010, GIS '10.

[64]  J. Friedman Greedy function approximation: A gradient boosting machine. , 2001 .

[65]  Bernhard Schölkopf,et al.  A Primer on Kernel Methods , 2004 .

[66]  Vladimir Risojevic,et al.  Gabor Descriptors for Aerial Image Classification , 2011, ICANNGA.

[67]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[68]  Farid Melgani,et al.  A Deep Learning Approach to UAV Image Multilabeling , 2017, IEEE Geoscience and Remote Sensing Letters.

[69]  Dumitru Erhan,et al.  Going deeper with convolutions , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[70]  Xiao Xiang Zhu,et al.  RiFCN: Recurrent Network in Fully Convolutional Network for Semantic Segmentation of High Resolution Remote Sensing Images , 2018, ArXiv.

[71]  Xiao Xiang Zhu,et al.  IM2HEIGHT: Height Estimation from Single Monocular Imagery via Fully Residual Convolutional-Deconvolutional Network , 2018, ArXiv.

[72]  Vladimir Risojevic,et al.  Aerial image classification using structural texture similarity , 2011, 2011 IEEE International Symposium on Signal Processing and Information Technology (ISSPIT).

[73]  Corinna Cortes,et al.  Support-Vector Networks , 1995, Machine Learning.

[74]  Gong Cheng,et al.  Remote Sensing Image Scene Classification Meets Deep Learning: Challenges, Methods, Benchmarks, and Opportunities , 2020, IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing.

[75]  B. S. Manjunath,et al.  Texture Features for Browsing and Retrieval of Image Data , 1996, IEEE Trans. Pattern Anal. Mach. Intell..

[76]  Bo Chen,et al.  MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications , 2017, ArXiv.

[77]  Sergey Ioffe,et al.  Rethinking the Inception Architecture for Computer Vision , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[78]  Gui-Song Xia,et al.  A benchmark for scene classification of high spatial resolution remote sensing imagery , 2015, 2015 IEEE International Geoscience and Remote Sensing Symposium (IGARSS).

[79]  Gui-Song Xia,et al.  AID: A Benchmark Data Set for Performance Evaluation of Aerial Scene Classification , 2016, IEEE Transactions on Geoscience and Remote Sensing.

[80]  Tong Zhang,et al.  Deep Learning Based Feature Selection for Remote Sensing Scene Classification , 2015, IEEE Geoscience and Remote Sensing Letters.

[81]  Xiangwen Liao,et al.  Land-use scene classification based on a CNN using a constrained extreme learning machine , 2018 .

[82]  Jefersson Alex dos Santos,et al.  Evaluating the Potential of Texture and Color Descriptors for Remote Sensing Image Retrieval and Classification , 2010, VISAPP.

[83]  Lei Guo,et al.  When Deep Learning Meets Metric Learning: Remote Sensing Image Scene Classification via Learning Discriminative CNNs , 2018, IEEE Transactions on Geoscience and Remote Sensing.

[84]  Xiao Xiang Zhu,et al.  Recurrently exploring class-wise attention in a hybrid convolutional and bidirectional LSTM network for multi-label aerial image classification , 2018, ISPRS journal of photogrammetry and remote sensing : official publication of the International Society for Photogrammetry and Remote Sensing.