论文信息 - SATIN: A Multi-Task Metadataset for Classifying Satellite Imagery using Vision-Language Models

SATIN: A Multi-Task Metadataset for Classifying Satellite Imagery using Vision-Language Models

Interpreting remote sensing imagery enables numerous downstream applications ranging from land-use planning to deforestation monitoring. Robustly classifying this data is challenging due to the Earth's geographic diversity. While many distinct satellite and aerial image classification datasets exist, there is yet to be a benchmark curated that suitably covers this diversity. In this work, we introduce SATellite ImageNet (SATIN), a metadataset curated from 27 existing remotely sensed datasets, and comprehensively evaluate the zero-shot transfer classification capabilities of a broad range of vision-language (VL) models on SATIN. We find SATIN to be a challenging benchmark-the strongest method we evaluate achieves a classification accuracy of 52.0%. We provide a $\href{https://satinbenchmark.github.io}{\text{public leaderboard}}$ to guide and track the progress of VL models in this important domain.

Samuel Albanie | K. Han | J. Roberts

[1] S. Savarese,et al. BLIP-2: Bootstrapping Language-Image Pre-training with Frozen Image Encoders and Large Language Models , 2023, ArXiv.

[2] D. Kocev,et al. Current Trends in Deep Learning for Earth Observation: An Open-source Benchmark Arena for Image Classification , 2022, ISPRS Journal of Photogrammetry and Remote Sensing.

[3] Gabriel Ilharco,et al. Reproducible Scaling Laws for Contrastive Language-Image Learning , 2022, 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[4] Aniruddha Kembhavi,et al. Satlas: A Large-Scale, Multi-Task Dataset for Remote Sensing Image Understanding , 2022, ArXiv.

[5] Christopher D. Manning,et al. Holistic Evaluation of Language Models , 2023, Annals of the New York Academy of Sciences.

[6] Diego de Las Casas,et al. NEVIS'22: A Stream of 100 Tasks Sampled from 30 Years of Computer Vision Research , 2022, ArXiv.

[7] Ludwig Schmidt,et al. LAION-5B: An open large-scale dataset for training next generation image-text models , 2022, NeurIPS.

[8] J. Cornebise,et al. Open High-Resolution Satellite Imagery: The WorldStrat Dataset - With Application to Super-Resolution , 2022, NeurIPS.

[9] Gerard de Melo,et al. Beyond the Imitation Game: Quantifying and extrapolating the capabilities of language models , 2022, ArXiv.

[10] Ryan A. Rossi,et al. CyCLIP: Cyclic Contrastive Language-Image Pretraining , 2022, NeurIPS.

[11] Yong Jae Lee,et al. ELEVATER: A Benchmark and Toolkit for Evaluating Language-Augmented Visual Models , 2022, NeurIPS.

[12] S. Hoi,et al. BLIP: Bootstrapping Language-Image Pre-training for Unified Vision-Language Understanding and Generation , 2022, ICML.

[13] Trevor Darrell,et al. A ConvNet for the 2020s , 2022, 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[14] Saining Xie,et al. SLIP: Self-supervision meets Language-Image Pre-training , 2021, ECCV.

[15] Zhenguo Li,et al. FILIP: Fine-grained Interactive Language-Image Pre-Training , 2021, ICLR.

[16] Junjie Yan,et al. Supervision Exists Everywhere: A Data Efficient Contrastive Language-Image Pre-training Paradigm , 2021, ICLR.

[17] Jenia Jitsev,et al. LAION-400M: Open Dataset of CLIP-Filtered 400 Million Image-Text Pairs , 2021, ArXiv.

[18] Alexander M. Rush,et al. Datasets: A Community Library for Natural Language Processing , 2021, EMNLP.

[19] Junnan Li,et al. Align before Fuse: Vision and Language Representation Learning with Momentum Distillation , 2021, NeurIPS.

[20] Ilya Sutskever,et al. Learning Transferable Visual Models From Natural Language Supervision , 2021, ICML.

[21] J. Landgraf,et al. Grand Challenges in Satellite Remote Sensing , 2021, Frontiers in Remote Sensing.

[22] Radu Soricut,et al. Conceptual 12M: Pushing Web-Scale Image-Text Pre-Training To Recognize Long-Tail Visual Concepts , 2021, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[23] Quoc V. Le,et al. Scaling Up Visual and Vision-Language Representation Learning With Noisy Text Supervision , 2021, ICML.

[24] S. Gelly,et al. An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale , 2020, ICLR.

[25] Xiao Xiang Zhu,et al. On Creating Benchmark Dataset for Aerial Image Interpretation: Reviews, Guidances, and Million-AID , 2021, IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing.

[26] Zhuang Zhou,et al. NaSC-TG2: Natural Scene Classification With Tiangong-2 Remotely Sensed Imagery , 2021, IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing.

[27] Jialong Chen,et al. MLRSNet: A Multi-label High Spatial Resolution Remote Sensing Dataset for Semantic Scene Understanding , 2020, ArXiv.

[28] F. Parmiggiani,et al. Satellite Observations for Detecting and Forecasting Sea-Ice Conditions: A Summary of Advances Made in the SPICES Project by the EU's Horizon 2020 Programme , 2020, Remote. Sens..

[29] Haifeng Li,et al. CLRS: Continual Learning Benchmark for Remote Sensing Image Scene Classification , 2020, Sensors.

[30] Alexis K.H. Lau,et al. New Era of Air Quality Monitoring from Space: Geostationary Environment Monitoring Spectrometer (GEMS) , 2020, Bulletin of the American Meteorological Society.

[31] Lysandre Debut,et al. HuggingFace's Transformers: State-of-the-art Natural Language Processing , 2019, ArXiv.

[32] Xiao Xiang Zhu,et al. Relation Network for Multilabel Aerial Image Classification , 2019, IEEE Transactions on Geoscience and Remote Sensing.

[33] Hugo Larochelle,et al. Meta-Dataset: A Dataset of Datasets for Learning to Learn from Few Examples , 2019, ICLR.

[34] Guisong Xia,et al. Land-cover classification with high-resolution remote sensing images using transferable deep models , 2018, Remote Sensing of Environment.

[35] Haifeng Li,et al. RSI-CB: A Large Scale Remote Sensing Image Classification Benchmark via Crowdsource Data , 2017, ArXiv.

[36] Xiao Xiang Zhu,et al. So2Sat LCZ42: A Benchmark Dataset for Global Local Climate Zones Classification , 2019, ArXiv.

[37] Matthew Patterson,et al. Land Cover Mapping in Data Scarce Environments: Challenges and Opportunities , 2019, Front. Environ. Sci..

[38] André Susano Pinto,et al. A Large-scale Study of Representation Learning with the Visual Task Adaptation Benchmark , 2019, 1910.04867.

[39] Chen Chen,et al. SmokeNet: Satellite Smoke Scene Detection Using Convolutional Neural Network with Spatial and Channel-Wise Attention , 2019, Remote. Sens..

[40] Omer Levy,et al. SuperGLUE: A Stickier Benchmark for General-Purpose Language Understanding Systems , 2019, NeurIPS.

[41] Begüm Demir,et al. Bigearthnet: A Large-Scale Benchmark Archive for Remote Sensing Image Understanding , 2019, IGARSS 2019 - 2019 IEEE International Geoscience and Remote Sensing Symposium.

[42] Xuelong Li,et al. Scene Classification With Recurrent Attention of VHR Remote Sensing Images , 2019, IEEE Transactions on Geoscience and Remote Sensing.

[43] Andreas Dengel,et al. EuroSAT: A Novel Dataset and Deep Learning Benchmark for Land Use and Land Cover Classification , 2017, IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing.

[44] Quoc Dung Cao,et al. Deep Learning Based Damage Detection on Post-Hurricane Satellite Imagery , 2018, ArXiv.

[45] Andreas Dengel,et al. Introducing Eurosat: A Novel Dataset and Deep Learning Benchmark for Land Use and Land Cover Classification , 2018, IGARSS 2018 - 2018 IEEE International Geoscience and Remote Sensing Symposium.

[46] Radu Soricut,et al. Conceptual Captions: A Cleaned, Hypernymed, Image Alt-text Dataset For Automatic Image Captioning , 2018, ACL.

[47] Omer Levy,et al. GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding , 2018, BlackboxNLP@EMNLP.

[48] Lorenzo Bruzzone,et al. Multilabel Remote Sensing Image Retrieval Using a Semisupervised Graph-Theoretic Method , 2018, IEEE Transactions on Geoscience and Remote Sensing.

[49] Xiangtao Zheng,et al. Exploring Models and Data for Remote Sensing Image Caption Generation , 2017, IEEE Transactions on Geoscience and Remote Sensing.

[50] Gordon Christie,et al. Functional Map of the World , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[51] Zhenfeng Shao,et al. PatternNet: A Benchmark Dataset for Performance Evaluation of Remote Sensing Image Retrieval , 2017, ISPRS Journal of Photogrammetry and Remote Sensing.

[52] K. Wagner. Geographic Information Systems and Glacial Environments , 2018 .

[53] Yang Long,et al. High-Resolution Remote Sensing Image Retrieval Based on CNNs from a Dimensional Perspective , 2017, Remote. Sens..

[54] A. Rosenqvist,et al. Current remote sensing approaches to monitoring forest degradation in support of countries measurement, reporting and verification (MRV) systems for REDD+ , 2017, Carbon Balance and Management.

[55] Xiaoqiang Lu,et al. Remote Sensing Image Scene Classification: Benchmark and State of the Art , 2017, Proceedings of the IEEE.

[56] Qing Liu,et al. Accurate Object Localization in Remote Sensing Images Based on Convolutional Neural Networks , 2017, IEEE Transactions on Geoscience and Remote Sensing.

[57] Gui-Song Xia,et al. AID: A Benchmark Data Set for Performance Evaluation of Aerial Scene Classification , 2016, IEEE Transactions on Geoscience and Remote Sensing.

[58] Keiller Nogueira,et al. Towards vegetation species discrimination by using data-driven descriptors , 2016, 2016 9th IAPR Workshop on Pattern Recogniton in Remote Sensing (PRRS).

[59] Vasco Diogo,et al. Land Cover and Land Use Indicators: Review of available data , 2016 .

[60] Bo Qu,et al. Deep semantic understanding of high resolution remote sensing image , 2016, 2016 International Conference on Computer, Information and Telecommunication Systems (CITS).

[61] Ping Tang,et al. Feature significance-based multibag-of-visual-words model for remote sensing image scene classification , 2016 .

[62] Joanne C. White,et al. Optical remotely sensed time series data for land cover classification: A review , 2016 .

[63] Gui-Song Xia,et al. Bag-of-Visual-Words Scene Classifier With Local and Global Features for High Spatial Resolution Remote Sensing Imagery , 2016, IEEE Geoscience and Remote Sensing Letters.

[64] Gui-Song Xia,et al. Dirichlet-Derived Multiple Topic Scene Classification Model for High Spatial Resolution Remote Sensing Imagery , 2016, IEEE Transactions on Geoscience and Remote Sensing.

[65] Liangpei Zhang,et al. The Fisher Kernel Coding Framework for High Spatial Resolution Scene Classification , 2016, Remote. Sens..

[66] Jian Sun,et al. Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[67] David A. Shamma,et al. YFCC100M , 2015, Commun. ACM.

[68] V. Voženílek,et al. Monitoring and modeling of urban sprawl through remote sensing and GIS in Kuala Lumpur, Malaysia , 2015, Ecological Processes.

[69] Tong Zhang,et al. Deep Learning Based Feature Selection for Remote Sensing Scene Classification , 2015, IEEE Geoscience and Remote Sensing Letters.

[70] Supratik Mukhopadhyay,et al. DeepSat: a learning framework for satellite imagery , 2015, SIGSPATIAL/GIS.

[71] Jefersson Alex dos Santos,et al. Do deep features generalize from everyday objects to remote sensing and aerial scenes domains? , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[72] Russell G. Congalton,et al. Global Land Cover Mapping: A Review and Uncertainty Analysis , 2014, Remote. Sens..