AID: A Benchmark Data Set for Performance Evaluation of Aerial Scene Classification

Aerial scene classification, which aims to automatically label an aerial image with a specific semantic category, is a fundamental problem for understanding high-resolution remote sensing imagery. In recent years, it has become an active task in the remote sensing area, and numerous algorithms have been proposed for this task, including many machine learning and data-driven approaches. However, the existing data sets for aerial scene classification, such as UC-Merced data set and WHU-RS19, contain relatively small sizes, and the results on them are already saturated. This largely limits the development of scene classification algorithms. This paper describes the Aerial Image data set (AID): a large-scale data set for aerial scene classification. The goal of AID is to advance the state of the arts in scene classification of remote sensing images. For creating AID, we collect and annotate more than 10000 aerial scene images. In addition, a comprehensive review of the existing aerial scene classification techniques as well as recent widely used deep learning methods is given. Finally, we provide a performance analysis of typical aerial scene classification and deep learning approaches on AID, which can be served as the baseline results on this benchmark.

[1]  Ping Tang,et al.  A 2-D wavelet decomposition-based bag-of-visual-words model for land-use scene classification , 2014 .

[2]  Ping Tang,et al.  Land-Use Scene Classification Using a Concentric Circle-Structured Multiscale Bag-of-Visual-Words Model , 2014, IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing.

[3]  Chih-Jen Lin,et al.  LIBLINEAR: A Library for Large Linear Classification , 2008, J. Mach. Learn. Res..

[4]  Patricia Gober,et al.  Per-pixel vs. object-based classification of urban land cover extraction using high spatial resolution imagery , 2011, Remote Sensing of Environment.

[5]  Anil M. Cheriyadat,et al.  Bag of Lines (BoL) for Improved Aerial Scene Representation , 2015, IEEE Geoscience and Remote Sensing Letters.

[6]  Gui-Song Xia,et al.  Learning High-level Features for Satellite Image Classification With Limited Labeled Samples , 2015, IEEE Transactions on Geoscience and Remote Sensing.

[7]  Vladimir Risojevic,et al.  Gabor Descriptors for Aerial Image Classification , 2011, ICANNGA.

[8]  Michael S. Bernstein,et al.  ImageNet Large Scale Visual Recognition Challenge , 2014, International Journal of Computer Vision.

[9]  Bin Luo,et al.  Local Scale Measure from the Topographic Map and Application to Remote Sensing Images , 2009, Multiscale Model. Simul..

[10]  Florent Perronnin,et al.  Fisher Kernels on Visual Vocabularies for Image Categorization , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[11]  Tong Zhang,et al.  Deep Learning Based Feature Selection for Remote Sensing Scene Classification , 2015, IEEE Geoscience and Remote Sensing Letters.

[12]  G LoweDavid,et al.  Distinctive Image Features from Scale-Invariant Keypoints , 2004 .

[13]  Tao Xu,et al.  Evaluation of local features for scene classification using VHR satellite images , 2011, 2011 Joint Urban Remote Sensing Event.

[14]  Junwei Han,et al.  A Survey on Object Detection in Optical Remote Sensing Images , 2016, ArXiv.

[15]  Gui-Song Xia,et al.  A benchmark for scene classification of high spatial resolution remote sensing imagery , 2015, 2015 IEEE International Geoscience and Remote Sensing Symposium (IGARSS).

[16]  Pascal Vincent,et al.  Stacked Denoising Autoencoders: Learning Useful Representations in a Deep Network with a Local Denoising Criterion , 2010, J. Mach. Learn. Res..

[17]  William J. Emery,et al.  Active Learning Methods for Remote Sensing Image Classification , 2009, IEEE Transactions on Geoscience and Remote Sensing.

[18]  Luisa Verdoliva,et al.  Land Use Classification in Remote Sensing Images by Convolutional Neural Networks , 2015, ArXiv.

[19]  Bin Luo,et al.  Indexing of Satellite Images With Different Resolutions by Wavelet Features , 2008, IEEE Transactions on Image Processing.

[20]  Liangpei Zhang,et al.  Hybrid generative/discriminative scene classification strategy based on latent dirichlet allocation for high spatial resolution remote sensing imagery , 2013, 2013 IEEE International Geoscience and Remote Sensing Symposium - IGARSS.

[21]  Liangpei Zhang,et al.  Detail-Preserving Smoothing Classifier Based on Conditional Random Fields for High Spatial Resolution Remote Sensing Imagery , 2015, IEEE Transactions on Geoscience and Remote Sensing.

[22]  Cordelia Schmid,et al.  Beyond Bags of Features: Spatial Pyramid Matching for Recognizing Natural Scene Categories , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[23]  Thomas Mensink,et al.  Improving the Fisher Kernel for Large-Scale Image Classification , 2010, ECCV.

[24]  Liangpei Zhang,et al.  A Hybrid Object-Oriented Conditional Random Field Classification Framework for High Spatial Resolution Remote Sensing Imagery , 2014, IEEE Transactions on Geoscience and Remote Sensing.

[25]  Xiang Zhang,et al.  OverFeat: Integrated Recognition, Localization and Detection using Convolutional Networks , 2013, ICLR.

[26]  Wen Yang,et al.  STRUCTURAL HIGH-RESOLUTION SATELLITE IMAGE INDEXING , 2010 .

[27]  Retno Kusumaningrum,et al.  Integrated visual vocabulary in latent Dirichlet allocation–based scene classification for IKONOS image , 2014 .

[28]  Mihai Datcu,et al.  Semantic Annotation of Satellite Images Using Latent Dirichlet Allocation , 2010, IEEE Geoscience and Remote Sensing Letters.

[29]  Yihong Gong,et al.  Nonlinear Learning using Local Coordinate Coding , 2009, NIPS.

[30]  Shawn D. Newsam,et al.  Geographic Image Retrieval Using Local Invariant Features , 2013, IEEE Transactions on Geoscience and Remote Sensing.

[31]  Hong Sun,et al.  Tile-Level Annotation of Satellite Images Using Multi-Level Max-Margin Discriminative Random Field , 2013, Remote. Sens..

[32]  Liangpei Zhang,et al.  Scene Classification Based on the Multifeature Fusion Probabilistic Topic Model for High Spatial Resolution Remote Sensing Imagery , 2015, IEEE Transactions on Geoscience and Remote Sensing.

[33]  Qian Song,et al.  Exploring the Use of Google Earth Imagery and Object-Based Methods in Land Use/Cover Mapping , 2013, Remote. Sens..

[34]  Ming Cui,et al.  Scene classification based on multifeature probabilistic latent semantic analysis for high spatial resolution remote sensing images , 2015 .

[35]  Anil M. Cheriyadat,et al.  Unsupervised Feature Learning for Aerial Scene Classification , 2014, IEEE Transactions on Geoscience and Remote Sensing.

[36]  Jefersson Alex dos Santos,et al.  Towards better exploiting convolutional neural networks for remote sensing scene classification , 2016, Pattern Recognit..

[37]  Vladimir Risojevic,et al.  Fusion of Global and Local Descriptors for Remote Sensing Image Classification , 2013, IEEE Geoscience and Remote Sensing Letters.

[38]  Yihong Gong,et al.  Locality-constrained Linear Coding for image classification , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[39]  Liangpei Zhang,et al.  Multi-feature probability topic scene classifier for high spatial resolution remote sensing imagery , 2014, 2014 IEEE Geoscience and Remote Sensing Symposium.

[40]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[41]  Mikhail F. Kanevski,et al.  A Survey of Active Learning Algorithms for Supervised Remote Sensing Image Classification , 2011, IEEE Journal of Selected Topics in Signal Processing.

[42]  Xueming Qian,et al.  Semantic Annotation of High-Resolution Satellite Images via Weakly Supervised Learning , 2016, IEEE Transactions on Geoscience and Remote Sensing.

[43]  Shawn D. Newsam,et al.  Bag-of-visual-words and spatial extensions for land-use classification , 2010, GIS '10.

[44]  Julie Delon,et al.  Accurate Junction Detection and Characterization in Natural Images , 2013, International Journal of Computer Vision.

[45]  Bo Du,et al.  Scene Classification via a Gradient Boosting Random Convolutional Network Framework , 2016, IEEE Transactions on Geoscience and Remote Sensing.

[46]  Zhang Xiangmin,et al.  Comparison of pixel‐based and object‐oriented image classification approaches—a case study in a coal fire area, Wuda, Inner Mongolia, China , 2006 .

[47]  Qiang Chen,et al.  Network In Network , 2013, ICLR.

[48]  Lei Guo,et al.  Object Detection in Optical Remote Sensing Images Based on Weakly Supervised Learning and High-Level Feature Learning , 2015, IEEE Transactions on Geoscience and Remote Sensing.

[49]  David Picard,et al.  Evaluation of second-order visual features for land-use classification , 2014, 2014 12th International Workshop on Content-Based Multimedia Indexing (CBMI).

[50]  Hong Sun,et al.  Unsupervised Feature Learning Via Spectral Clustering of Multidimensional Patches for Remotely Sensed Scene Classification , 2015, IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing.

[51]  Xinwei Zheng,et al.  Automatic Annotation of Satellite Images via Multifeature Joint Sparse Coding With Spatial Relation Constraint , 2013, IEEE Geoscience and Remote Sensing Letters.

[52]  Gang Liu,et al.  Texture Analysis with Shape Co-occurrence Patterns , 2014, 2014 22nd International Conference on Pattern Recognition.

[53]  Josef Strobl,et al.  What’s wrong with pixels? Some recent developments interfacing remote sensing and GIS , 2001 .

[54]  Junwei Han,et al.  Learning Rotation-Invariant Convolutional Neural Networks for Object Detection in VHR Optical Remote Sensing Images , 2016, IEEE Transactions on Geoscience and Remote Sensing.

[55]  Z. Babic,et al.  Orientation difference descriptor for aerial image classification , 2012, 2012 19th International Conference on Systems, Signals and Image Processing (IWSSIP).

[56]  N. B. Kotliar,et al.  Multiple scales of patchiness and patch structure: a hierarchical framework for the study of heterogeneity , 1990 .

[57]  Junwei Han,et al.  Multi-class geospatial object detection and geographic image classification based on collection of part detectors , 2014 .

[58]  Gang Liu,et al.  A Hierarchical Scheme of Multiple Feature Fusion for High-Resolution Satellite Scene Categorization , 2013, ICVS.

[59]  Gui-Song Xia,et al.  Extreme value theory-based calibration for the fusion of multiple features in high-resolution satellite scene classification , 2013 .

[60]  Andrew Zisserman,et al.  Video Google: a text retrieval approach to object matching in videos , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[61]  Terrance E. Boult,et al.  Multi-attribute spaces: Calibration for attribute fusion and similarity search , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[62]  Aleksej Avramovic,et al.  Block-based semantic classification of high-resolution multispectral aerial images , 2016, Signal Image Video Process..

[63]  Yee Whye Teh,et al.  A Fast Learning Algorithm for Deep Belief Nets , 2006, Neural Computation.

[64]  Markus A. Stricker,et al.  Similarity of color images , 1995, Electronic Imaging.

[65]  Lei Guo,et al.  Effective and Efficient Midlevel Visual Elements-Oriented Land-Use Classification Using VHR Remote Sensing Images , 2015, IEEE Transactions on Geoscience and Remote Sensing.

[66]  B. S. Manjunath,et al.  Texture Features for Browsing and Retrieval of Image Data , 1996, IEEE Trans. Pattern Anal. Mach. Intell..

[67]  Stéphane Mallat,et al.  Combined scattering for rotation invariant texture analysis , 2012, ESANN.

[68]  Michael J. Swain,et al.  Color indexing , 1991, International Journal of Computer Vision.

[69]  Lu Wang,et al.  Land-use scene classification using multi-scale completed local binary patterns , 2015, Signal, Image and Video Processing.

[70]  Andrew Zisserman,et al.  Scene Classification Via pLSA , 2006, ECCV.

[71]  Jefersson Alex dos Santos,et al.  Do deep features generalize from everyday objects to remote sensing and aerial scenes domains? , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[72]  Dumitru Erhan,et al.  Going deeper with convolutions , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[73]  Bei Zhao,et al.  Scene classification via latent Dirichlet allocation using a hybrid generative/discriminative strategy for high spatial resolution remote sensing imagery , 2013 .

[74]  T. Blaschke,et al.  Object-based contextual image classification built on image segmentation , 2003, IEEE Workshop on Advances in Techniques for Analysis of Remotely Sensed Data, 2003.

[75]  Julie Delon,et al.  Shape-based Invariant Texture Indexing , 2010, International Journal of Computer Vision.

[76]  Thomas Blaschke,et al.  Object based image analysis for remote sensing , 2010 .

[77]  Yingli Tian,et al.  Pyramid of Spatial Relatons for Scene-Level Land Use Classification , 2015, IEEE Transactions on Geoscience and Remote Sensing.

[78]  Gui-Song Xia,et al.  Transferring Deep Convolutional Neural Networks for the Scene Classification of High-Resolution Remote Sensing Imagery , 2015, Remote. Sens..

[79]  Shawn D. Newsam,et al.  Comparing SIFT descriptors and gabor texture features for classification of remote sensed imagery , 2008, 2008 15th IEEE International Conference on Image Processing.

[80]  Antonio Torralba,et al.  Modeling the Shape of the Scene: A Holistic Representation of the Spatial Envelope , 2001, International Journal of Computer Vision.

[81]  Bo Du,et al.  Saliency-Guided Unsupervised Feature Learning for Scene Classification , 2015, IEEE Transactions on Geoscience and Remote Sensing.

[82]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[83]  Trevor Darrell,et al.  Caffe: Convolutional Architecture for Fast Feature Embedding , 2014, ACM Multimedia.

[84]  Vladimir Risojevic,et al.  Aerial image classification using structural texture similarity , 2011, 2011 IEEE International Symposium on Signal Processing and Information Technology (ISSPIT).

[85]  Jefersson Alex dos Santos,et al.  Evaluating the Potential of Texture and Color Descriptors for Remote Sensing Image Retrieval and Classification , 2010, VISAPP.

[86]  Yihong Gong,et al.  Linear spatial pyramid matching using sparse coding for image classification , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[87]  Matti Pietikäinen,et al.  Multiresolution Gray-Scale and Rotation Invariant Texture Classification with Local Binary Patterns , 2002, IEEE Trans. Pattern Anal. Mach. Intell..

[88]  Robert M. Haralick,et al.  Textural Features for Image Classification , 1973, IEEE Trans. Syst. Man Cybern..

[89]  Xi Chen,et al.  Measuring the Effectiveness of Various Features for Thematic Information Extraction From Very High Resolution Remote Sensing Imagery , 2015, IEEE Transactions on Geoscience and Remote Sensing.

[90]  Gui-Song Xia,et al.  A Comparative Study of Sampling Analysis in the Scene Classification of Optical High-Spatial Resolution Remote Sensing Imagery , 2015, Remote. Sens..

[91]  Wen Yang,et al.  High-resolution satellite scene classification using a sparse coding based multiple feature combination , 2012 .

[92]  Dengxin Dai,et al.  Satellite Image Classification via Two-Layer Sparse Coding With Biased Image Representation , 2011, IEEE Geoscience and Remote Sensing Letters.

[93]  Shawn D. Newsam,et al.  Spatial pyramid co-occurrence for image classification , 2011, 2011 International Conference on Computer Vision.

[94]  Bin Luo,et al.  Indexing of Remote Sensing Images With Different Resolutions by Multiple Features , 2013, IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing.

[95]  Bolei Zhou,et al.  Learning Deep Features for Scene Recognition using Places Database , 2014, NIPS.

[96]  Steven E. Franklin,et al.  A comparison of pixel-based and object-based image analysis with selected machine learning algorithms for the classification of agricultural landscapes using SPOT-5 HRG imagery , 2012 .

[97]  Xiaoqiang Lu,et al.  Scene Recognition by Manifold Regularized Deep Learning Architecture , 2015, IEEE Transactions on Neural Networks and Learning Systems.

[98]  Brian P. Salmon,et al.  Multiview Deep Learning for Land-Use Classification , 2015, IEEE Geoscience and Remote Sensing Letters.

[99]  Michael I. Jordan,et al.  Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..

[100]  Cordelia Schmid,et al.  Aggregating Local Image Descriptors into Compact Codes , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.