Recurrent Transformer Network for Remote Sensing Scene Categorisation

Remote sensing scene categorisation is a task to distinguish the basic level scene images in accordance with the contents of the subordinate level feature representations. This gives rise to a significant semantic gap between subordinate level features and the basic level scene contents. In this paper, we propose recurrent transformer networks (RTN) to mitigate the above problem. RTN incorporates learning transformation-invariant regions with transformer based attention mechanism, thus reducing the semantic gap efficiently. It also can learn the canonical appearance for the most relevant regions based on the subordinate level contents of the remote sensing scene images. The predictions of both transformation parameters and classification score are derived from the bilinear CNN pooling regression. The whole network is differentiable and can be learned end-by-end by only acquiring the basic level labels. Through extensive experiments, we demonstrate that our RTN is able to achieve state-of-the-art performance on several public remote sensing scene datasets.

[1]  Junwei Han,et al.  Automatic landslide detection from remote-sensing imagery using a scene classification method based on BoVW and pLSA , 2013 .

[2]  Bo Du,et al.  Deep Learning for Remote Sensing Data: A Technical Tutorial on the State of the Art , 2016, IEEE Geoscience and Remote Sensing Magazine.

[3]  Yihong Gong,et al.  Locality-constrained Linear Coding for image classification , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[4]  Xiaoqiang Lu,et al.  Remote Sensing Image Scene Classification: Benchmark and State of the Art , 2017, Proceedings of the IEEE.

[5]  Subhransu Maji,et al.  Bilinear CNN Models for Fine-Grained Visual Recognition , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[6]  Junwei Han,et al.  Learning Rotation-Invariant Convolutional Neural Networks for Object Detection in VHR Optical Remote Sensing Images , 2016, IEEE Transactions on Geoscience and Remote Sensing.

[7]  Qian Song,et al.  Exploring the Use of Google Earth Imagery and Object-Based Methods in Land Use/Cover Mapping , 2013, Remote. Sens..

[8]  Qian Du,et al.  Fusing Local and Global Features for High-Resolution Scene Classification , 2017, IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing.

[9]  Anil M. Cheriyadat,et al.  Unsupervised Feature Learning for Aerial Scene Classification , 2014, IEEE Transactions on Geoscience and Remote Sensing.

[10]  Xingrui Yu,et al.  Deep learning in remote sensing scene classification: a data augmentation enhanced convolutional neural network framework , 2017 .

[11]  Jefersson Alex dos Santos,et al.  Towards better exploiting convolutional neural networks for remote sensing scene classification , 2016, Pattern Recognit..

[12]  Lei Guo,et al.  Effective and Efficient Midlevel Visual Elements-Oriented Land-Use Classification Using VHR Remote Sensing Images , 2015, IEEE Transactions on Geoscience and Remote Sensing.

[13]  Xueming Qian,et al.  Semantic Annotation of High-Resolution Satellite Images via Weakly Supervised Learning , 2016, IEEE Transactions on Geoscience and Remote Sensing.

[14]  Shawn D. Newsam,et al.  Bag-of-visual-words and spatial extensions for land-use classification , 2010, GIS '10.

[15]  Uwe Stilla,et al.  Deep Learning Earth Observation Classification Using ImageNet Pretrained Networks , 2016, IEEE Geoscience and Remote Sensing Letters.

[16]  Lei Guo,et al.  When Deep Learning Meets Metric Learning: Remote Sensing Image Scene Classification via Learning Discriminative CNNs , 2018, IEEE Transactions on Geoscience and Remote Sensing.

[17]  Naif Alajlan,et al.  Land-Use Classification With Compressive Sensing Multifeature Fusion , 2015, IEEE Geoscience and Remote Sensing Letters.

[18]  Xiangtao Zheng,et al.  Joint Dictionary Learning for Multispectral Change Detection , 2017, IEEE Transactions on Cybernetics.

[19]  Andrew Zisserman,et al.  Spatial Transformer Networks , 2015, NIPS.

[20]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[21]  Gui-Song Xia,et al.  AID: A Benchmark Data Set for Performance Evaluation of Aerial Scene Classification , 2016, IEEE Transactions on Geoscience and Remote Sensing.

[22]  Lu Wang,et al.  Land-use scene classification using multi-scale completed local binary patterns , 2015, Signal, Image and Video Processing.

[23]  Cordelia Schmid,et al.  Beyond Bags of Features: Spatial Pyramid Matching for Recognizing Natural Scene Categories , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).