Multi-Scale Convolutional Neural Network for Remote Sensing Scene Classification

In recent years the problem of scene classification in remote sensing has attracted a considerable amount of attention. Solution for this important problem based on deep convolutional neural networks (CNN) are currently state-of-the-art. So far all CNNs used images of fixed size (typically $224\times 224$ which commonly used in other fields of computer vision). In this paper, we propose a multi-scale deep CNN architecture that can accept variable image sizes. We achieve this by using multiple CNN, that share some or all parameters, followed by a merge layer, fully connected layers, and finally a softmax layer for classification. In each epoch we train the network with a batch of images of all scales. We have implemented this architecture using three SqueezeNet CNNs trained on three different scales of scene images. Then we carried out experiments on three well know datasets, namely UC Merced, KSA, and AID datasets. Preliminary results show that this multi-scale CNN do converge just as the traditional single-scale training, and leads to better testing accuracy.

[1]  Jian Sun,et al.  Spatial Pyramid Pooling in Deep Convolutional Networks for Visual Recognition , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[2]  Shawn D. Newsam,et al.  Bag-of-visual-words and spatial extensions for land-use classification , 2010, GIS '10.

[3]  Cordelia Schmid,et al.  Beyond Bags of Features: Spatial Pyramid Matching for Recognizing Natural Scene Categories , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[4]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[5]  Jiaxing Zhang,et al.  Scale-Invariant Convolutional Neural Networks , 2014, ArXiv.

[6]  Thierry Blu,et al.  Linear interpolation revitalized , 2004, IEEE Transactions on Image Processing.

[7]  Trevor Darrell,et al.  The pyramid match kernel: discriminative classification with sets of image features , 2005, Tenth IEEE International Conference on Computer Vision (ICCV'05) Volume 1.

[8]  Naif Alajlan,et al.  Domain Adaptation Network for Cross-Scene Classification , 2017, IEEE Transactions on Geoscience and Remote Sensing.

[9]  Iasonas Kokkinos,et al.  Deep Filter Banks for Texture Recognition, Description, and Segmentation , 2015, International Journal of Computer Vision.

[10]  Forrest N. Iandola,et al.  SqueezeNet: AlexNet-level accuracy with 50x fewer parameters and <1MB model size , 2016, ArXiv.