Revisiting pre-trained remote sensing model benchmarks: resizing and normalization matters

Research in self-supervised learning (SSL) with natural images has progressed rapidly in recent years and is now increasingly being applied to and benchmarked with datasets containing remotely sensed imagery. A common benchmark case is to evaluate SSL pre-trained model embeddings on datasets of remotely sensed imagery with small patch sizes, e.g., 32x32 pixels, whereas standard SSL pre-training takes place with larger patch sizes, e.g., 224x224. Furthermore, pre-training methods tend to use different image normalization preprocessing steps depending on the dataset. In this paper, we show, across seven satellite and aerial imagery datasets of varying resolution, that by simply following the preprocessing steps used in pre-training (precisely, image sizing and normalization methods), one can achieve significant performance improvements when evaluating the extracted features on downstream tasks -- an important detail overlooked in previous work in this space. We show that by following these steps, ImageNet pre-training remains a competitive baseline for satellite imagery based transfer learning tasks -- for example we find that these steps give +32.28 to overall accuracy on the So2Sat random split dataset and +11.16 on the EuroSAT dataset. Finally, we report comprehensive benchmark results with a variety of simple baseline methods for each of the seven datasets, forming an initial benchmark suite for remote sensing imagery.

[1]  D. Rolnick,et al.  Lightweight, Pre-trained Transformers for Remote Sensing Timeseries , 2023, ArXiv.

[2]  G. Cong,et al.  On the Opportunities and Challenges of Foundation Models for Geospatial Artificial Intelligence , 2023, ArXiv.

[3]  Junghoon Seo,et al.  A Billion-scale Foundation Model for Remote Sensing Images , 2023, IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing.

[4]  B. Kleinschmit,et al.  TreeSatAI Benchmark Archive: a multi-sensor, multi-label dataset for tree species classification in remote sensing , 2023, Earth System Science Data.

[5]  Jayesh K. Gupta,et al.  ClimaX: A foundation model for weather and climate , 2023, ArXiv.

[6]  Chuang Gan,et al.  Self-supervised Audiovisual Representation Learning for Remote Sensing Data , 2021, Int. J. Appl. Earth Obs. Geoinformation.

[7]  Trevor Darrell,et al.  Scale-MAE: A Scale-Aware Masked Autoencoder for Multiscale Geospatial Representation Learning , 2022, ArXiv.

[8]  Xiao Xiang Zhu,et al.  SSL4EO-S12: A Large-Scale Multi-Modal, Multi-Temporal Dataset for Self-Supervised Learning in Earth Observation , 2022, ArXiv.

[9]  K. Millard,et al.  Transfer Learning with Pretrained Remote Sensing Transformers , 2022, ArXiv.

[10]  S. Ermon,et al.  SatMAE: Pre-training Transformers for Temporal and Multi-Spectral Satellite Imagery , 2022, NeurIPS.

[11]  Jifeng Dai,et al.  ConvMAE: Masked Convolution Meets Masked Autoencoders , 2022, ArXiv.

[12]  Guisong Xia,et al.  An Empirical Study of Remote Sensing Pretraining , 2022, IEEE Transactions on Geoscience and Remote Sensing.

[13]  Peyman Najafirad,et al.  Supervising Remote Sensing Change Detection Models With 3d Surface Semantics , 2022, 2022 IEEE International Conference on Image Processing (ICIP).

[14]  Percy Liang,et al.  Fine-Tuning can Distort Pretrained Features and Underperform Out-of-Distribution , 2022, ICLR.

[15]  Mahdyar Ravanbakhsh,et al.  Deep Unsupervised Contrastive Hashing for Large-Scale Cross-Modal Text-Image Retrieval in Remote Sensing , 2022, ArXiv.

[16]  Isaac A. Corley,et al.  TorchGeo: deep learning with geospatial data , 2021, SIGSPATIAL/GIS.

[17]  Ross B. Girshick,et al.  Masked Autoencoders Are Scalable Vision Learners , 2021, 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[18]  K. Millard,et al.  SatViT: Pretraining Transformers for Earth Observation , 2022, IEEE Geoscience and Remote Sensing Letters.

[19]  Jiwen Lu,et al.  RingMo: A Remote Sensing Foundation Model With Masked Image Modeling , 2023, IEEE Transactions on Geoscience and Remote Sensing.

[20]  Hamed Alemohammad,et al.  Toward Foundation Models for Earth Monitoring: Proposal for a Climate Change Benchmark , 2021, ArXiv.

[21]  Rui Qian,et al.  Revisiting 3D ResNets for Video Recognition , 2021, ArXiv.

[22]  Hei Law,et al.  Revisiting Point Cloud Shape Classification with a Simple and Effective Baseline , 2021, ICML.

[23]  Begüm Demir,et al.  BigEarthNet-MM: A Large-Scale, Multimodal, Multilabel Benchmark Archive for Remote Sensing Image Classification and Retrieval [Software and Data Sets] , 2021, IEEE Geoscience and Remote Sensing Magazine.

[24]  Julien Mairal,et al.  Emerging Properties in Self-Supervised Vision Transformers , 2021, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).

[25]  Pau Rodríguez López,et al.  Seasonal Contrast: Unsupervised Pre-Training from Uncurated Remote Sensing Data , 2021, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).

[26]  Ekin D. Cubuk,et al.  Revisiting ResNets: Improved Training and Scaling Strategies , 2021, NeurIPS.

[27]  S. Ermon,et al.  Geography-Aware Self-Supervised Learning , 2020, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).

[28]  S. Gelly,et al.  An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale , 2020, ICLR.

[29]  Esther Rolf,et al.  A generalizable and accessible approach to machine learning with global satellite imagery , 2020, Nature Communications.

[30]  Ulf Krumnack,et al.  (Input) Size Matters for CNN Classifiers , 2021, ICANN.

[31]  Xiaohua Zhai,et al.  Training General Representations for Remote Sensing Using in-Domain Knowledge , 2020, IGARSS 2020 - 2020 IEEE International Geoscience and Remote Sensing Symposium.

[32]  Ser-Nam Lim,et al.  A Metric Learning Reality Check , 2020, ECCV.

[33]  Kaiming He,et al.  Improved Baselines with Momentum Contrastive Learning , 2020, ArXiv.

[34]  Xiao Xiang Zhu,et al.  So2Sat LCZ42: A Benchmark Data Set for the Classification of Global Local Climate Zones [Software and Data Sets] , 2020, IEEE Geoscience and Remote Sensing Magazine.

[35]  Ross B. Girshick,et al.  Momentum Contrast for Unsupervised Visual Representation Learning , 2019, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[36]  Natalia Gimelshein,et al.  PyTorch: An Imperative Style, High-Performance Deep Learning Library , 2019, NeurIPS.

[37]  Qun Liu,et al.  DeepSat V2: feature augmented convolutional neural nets for satellite image classification , 2019, Remote Sensing Letters.

[38]  Xiaohua Zhai,et al.  In-domain representation learning for remote sensing , 2019, ArXiv.

[39]  Matthijs Douze,et al.  Fixing the train-test resolution discrepancy , 2019, NeurIPS.

[40]  Begüm Demir,et al.  Bigearthnet: A Large-Scale Benchmark Archive for Remote Sensing Image Understanding , 2019, IGARSS 2019 - 2019 IEEE International Geoscience and Remote Sensing Symposium.

[41]  Andreas Dengel,et al.  EuroSAT: A Novel Dataset and Deep Learning Benchmark for Land Use and Land Cover Classification , 2017, IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing.

[42]  Tie-Yan Liu,et al.  LightGBM: A Highly Efficient Gradient Boosting Decision Tree , 2017, NIPS.

[43]  Xiaoqiang Lu,et al.  Remote Sensing Image Scene Classification: Benchmark and State of the Art , 2017, Proceedings of the IEEE.

[44]  Alexei A. Efros,et al.  What makes ImageNet good for transfer learning? , 2016, ArXiv.

[45]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[46]  Supratik Mukhopadhyay,et al.  DeepSat: a learning framework for satellite imagery , 2015, SIGSPATIAL/GIS.

[47]  Lorenzo Torresani,et al.  Learning Spatiotemporal Features with 3D Convolutional Networks , 2014, 2015 IEEE International Conference on Computer Vision (ICCV).

[48]  Martha C. Anderson,et al.  Landsat-8: Science and Product Vision for Terrestrial Global Change Research , 2014 .

[49]  Malcolm Davidson,et al.  GMES Sentinel-1 mission , 2012 .

[50]  Matthias Drusch,et al.  Sentinel-2: ESA's Optical High-Resolution Mission for GMES Operational Services , 2012 .

[51]  Shawn D. Newsam,et al.  Bag-of-visual-words and spatial extensions for land-use classification , 2010, GIS '10.

[52]  Fei-Fei Li,et al.  ImageNet: A large-scale hierarchical image database , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[53]  Geoffrey E. Hinton,et al.  Visualizing Data using t-SNE , 2008 .

[54]  Peter E. Hart,et al.  Nearest neighbor pattern classification , 1967, IEEE Trans. Inf. Theory.