Multiple Instance Dense Connected Convolution Neural Network for Aerial Image Scene Classification

In contrast with nature scenes, aerial scenes are often composed of many objects crowdedly distributed on the surface in bird’s view, the description of which usually demands more discriminative features as well as local semantics. However, when applied to scene classification, most of the existing convolution neural networks (ConvNets) tend to depict global semantics of images, and the loss of low- and mid-level features can hardly be avoided, especially when the model goes deeper. To tackle these challenges, in this paper, we propose a multiple-instance densely-connected ConvNet (MIDC-Net) for aerial scene classification. It regards aerial scene classification as a multiple-instance learning problem so that local semantics can be further investigated. Our classification model consists of an instance-level classifier, a multiple instance pooling and followed by a bag-level classification layer. In the instance-level classifier, we propose a simplified dense connection structure to effectively preserve features from different levels. The extracted convolution features are further converted into instance feature vectors. Then, we propose a trainable attention-based multiple instance pooling. It highlights the local semantics relevant to the scene label and outputs the bag-level probability directly. Finally, with our bag-level classification layer, this multiple instance learning framework is under the direct supervision of bag labels. Experiments on three widely-utilized aerial scene benchmarks demonstrate that our proposed method outperforms many state-of-the-art methods by a large margin with much fewer parameters.

[1]  Gui-Song Xia,et al.  Bag-of-Visual-Words Scene Classifier With Local and Global Features for High Spatial Resolution Remote Sensing Imagery , 2016, IEEE Geoscience and Remote Sensing Letters.

[2]  Gui-Song Xia,et al.  AID: A Benchmark Data Set for Performance Evaluation of Aerial Scene Classification , 2016, IEEE Transactions on Geoscience and Remote Sensing.

[3]  Xinpeng Zhang,et al.  Detection of double JPEG compression using modified DenseNet model , 2018, Multimedia Tools and Applications.

[4]  Xiao Xiang Zhu,et al.  Vehicle Instance Segmentation From Aerial Image and Video Using a Multitask Learning Residual Fully Convolutional Network , 2018, IEEE Transactions on Geoscience and Remote Sensing.

[5]  Kilian Q. Weinberger,et al.  CondenseNet: An Efficient DenseNet Using Learned Group Convolutions , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[6]  Gui-Song Xia,et al.  Transferring Deep Convolutional Neural Networks for the Scene Classification of High-Resolution Remote Sensing Imagery , 2015, Remote. Sens..

[7]  Yang Long,et al.  Learning RoI Transformer for Oriented Object Detection in Aerial Images , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[8]  Pierre-Marc Jodoin,et al.  Perspective-SIFT: An efficient tool for low-altitude remote sensing image registration , 2013, Signal Process..

[9]  Sepp Hochreiter,et al.  The Vanishing Gradient Problem During Learning Recurrent Neural Nets and Problem Solutions , 1998, Int. J. Uncertain. Fuzziness Knowl. Based Syst..

[10]  Ronan Collobert,et al.  From image-level to pixel-level labeling with Convolutional Networks , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[11]  Peng Tang,et al.  Learning Multi-Instance Deep Discriminative Patterns for Image Classification , 2017, IEEE Transactions on Image Processing.

[12]  Kilian Q. Weinberger,et al.  Densely Connected Convolutional Networks , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[13]  Yi Yang,et al.  Discovering Discriminative Graphlets for Aerial Image Categories Recognition , 2013, IEEE Transactions on Image Processing.

[14]  Fahad Shahbaz Khan,et al.  Binary Patterns Encoded Convolutional Neural Networks for Texture Recognition and Remote Sensing Scene Classification , 2017, ArXiv.

[15]  Zhenwei Shi,et al.  Random Access Memories: A New Paradigm for Target Detection in High Resolution Aerial Remote Sensing Images , 2018, IEEE Transactions on Image Processing.

[16]  Xudong Jiang,et al.  LBP-Based Edge-Texture Features for Object Recognition , 2014, IEEE Transactions on Image Processing.

[17]  Martial Hebert,et al.  Log-DenseNet: How to Sparsify a DenseNet , 2017, ArXiv.

[18]  Matthias Bethge,et al.  Approximating CNNs with Bag-of-local-Features models works surprisingly well on ImageNet , 2019, ICLR.

[19]  Max Welling,et al.  Attention-based Deep Multiple Instance Learning , 2018, ICML.

[20]  Lei Guo,et al.  When Deep Learning Meets Metric Learning: Remote Sensing Image Scene Classification via Learning Discriminative CNNs , 2018, IEEE Transactions on Geoscience and Remote Sensing.

[21]  Lihi Zelnik-Manor,et al.  OTC: A Novel Local Descriptor for Scene Classification , 2014, ECCV.

[22]  Yaoqin Xie,et al.  A Sparse-View CT Reconstruction Method Based on Combination of DenseNet and Deconvolution , 2018, IEEE Transactions on Medical Imaging.

[23]  Jianfei Cai,et al.  MIML-FCN+: Multi-Instance Multi-Label Learning via Fully Convolutional Networks with Privileged Information , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[24]  Gui-Song Xia,et al.  Dirichlet-Derived Multiple Topic Scene Classification Model for High Spatial Resolution Remote Sensing Imagery , 2016, IEEE Transactions on Geoscience and Remote Sensing.

[25]  Michael I. Jordan,et al.  Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..

[26]  Tat-Seng Chua,et al.  SCA-CNN: Spatial and Channel-Wise Attention in Convolutional Networks for Image Captioning , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[27]  Xiaoqiang Lu,et al.  Remote Sensing Image Scene Classification: Benchmark and State of the Art , 2017, Proceedings of the IEEE.

[28]  Bohyung Han,et al.  Progressive Attention Networks for Visual Attribute Prediction , 2016, BMVC.

[29]  Ping Tang,et al.  Land-Use Scene Classification Using a Concentric Circle-Structured Multiscale Bag-of-Visual-Words Model , 2014, IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing.

[30]  Jiwen Lu,et al.  Learning a Discriminative Distance Metric With Label Consistency for Scene Classification , 2017, IEEE Transactions on Geoscience and Remote Sensing.

[31]  Xiao Xiang Zhu,et al.  Deep Learning in Remote Sensing: A Comprehensive Review and List of Resources , 2017, IEEE Geoscience and Remote Sensing Magazine.

[32]  Zhi-Hua Zhou,et al.  Adapting RBF Neural Networks to Multi-Instance Learning , 2006, Neural Processing Letters.

[33]  Bin Luo,et al.  Indexing of Satellite Images With Different Resolutions by Wavelet Features , 2008, IEEE Transactions on Image Processing.

[34]  Samia Boukir,et al.  Relevance of airborne lidar and multispectral image data for urban scene classification using Random Forests , 2011 .

[35]  Xiaohui Xie,et al.  Deep Multi-instance Networks with Sparse Label Assignment for Whole Mammogram Classification , 2016, bioRxiv.

[36]  Junwei Han,et al.  Efficient, simultaneous detection of multi-class geospatial targets based on visual saliency modeling and discriminative learning of sparse coding , 2014 .

[37]  Chong Wang,et al.  Large-Scale Weakly Supervised Object Localization via Latent Category Learning , 2015, IEEE Transactions on Image Processing.

[38]  Enhua Wu,et al.  Squeeze-and-Excitation Networks , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[39]  Jiebo Luo,et al.  DOTA: A Large-Scale Dataset for Object Detection in Aerial Images , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[40]  Yoshua Bengio,et al.  Show, Attend and Tell: Neural Image Caption Generation with Visual Attention , 2015, ICML.

[41]  Zhi-Hua Zhou,et al.  Improve Multi-Instance Neural Networks through Feature Selection , 2004, Neural Processing Letters.

[42]  Thomas Hofmann,et al.  Unsupervised Learning by Probabilistic Latent Semantic Analysis , 2004, Machine Learning.

[43]  Cong Lin,et al.  Integrating Multilayer Features of Convolutional Neural Networks for Remote Sensing Scene Classification , 2017, IEEE Transactions on Geoscience and Remote Sensing.

[44]  Shawn D. Newsam,et al.  Geographic Image Retrieval Using Local Invariant Features , 2013, IEEE Transactions on Geoscience and Remote Sensing.

[45]  Aleksej Avramovic,et al.  Block-based semantic classification of high-resolution multispectral aerial images , 2016, Signal Image Video Process..

[46]  Bohyung Han,et al.  Hierarchical Attention Networks , 2016, ArXiv.

[47]  Oded Maron,et al.  Multiple-Instance Learning for Natural Scene Classification , 1998, ICML.

[48]  Wenyu Liu,et al.  Revisiting multiple instance neural networks , 2016, Pattern Recognit..

[49]  Michael J. Swain,et al.  Color indexing , 1991, International Journal of Computer Vision.

[50]  Huan Du,et al.  Depth-Aware Salient Object Detection and Segmentation via Multiscale Discriminative Saliency Fusion and Bootstrap Learning , 2017, IEEE Transactions on Image Processing.

[51]  Yanfei Zhong,et al.  A spectral–structural bag-of-features scene classifier for very high spatial resolution remote sensing imagery , 2016 .

[52]  Liangpei Zhang,et al.  Scene Classification Based on the Multifeature Fusion Probabilistic Topic Model for High Spatial Resolution Remote Sensing Imagery , 2015, IEEE Transactions on Geoscience and Remote Sensing.

[53]  Deyu Meng,et al.  Co-Saliency Detection via a Self-Paced Multiple-Instance Learning Framework , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[54]  Jefersson Alex dos Santos,et al.  Towards better exploiting convolutional neural networks for remote sensing scene classification , 2016, Pattern Recognit..

[55]  Thomas G. Dietterich,et al.  Solving the Multiple Instance Problem with Axis-Parallel Rectangles , 1997, Artif. Intell..

[56]  Liangpei Zhang,et al.  Adaptive Deep Sparse Semantic Modeling Framework for High Spatial Resolution Image Scene Classification , 2018, IEEE Transactions on Geoscience and Remote Sensing.

[57]  Shihong Du,et al.  A Linear Dirichlet Mixture Model for decomposing scenes: Application to analyzing urban functional zonings , 2015 .

[58]  Antonio Torralba,et al.  Recognizing indoor scenes , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[59]  Yong Man Ro,et al.  Color Local Texture Features for Color Face Recognition , 2012, IEEE Transactions on Image Processing.

[60]  Qilong Wang,et al.  ECA-Net: Efficient Channel Attention for Deep Convolutional Neural Networks , 2019, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[61]  Zhipeng Jia,et al.  Constrained Deep Weak Supervision for Histopathology Image Segmentation , 2017, IEEE Transactions on Medical Imaging.

[62]  Liangpei Zhang,et al.  Pre-Trained AlexNet Architecture with Pyramid Pooling and Supervision for High Spatial Resolution Remote Sensing Image Scene Classification , 2017, Remote. Sens..

[63]  X. Jin,et al.  A new modified panoramic UAV image stitching model based on the GA-SIFT and adaptive threshold method , 2017, Memetic Comput..

[64]  Matti Pietikäinen,et al.  Multiresolution Gray-Scale and Rotation Invariant Texture Classification with Local Binary Patterns , 2002, IEEE Trans. Pattern Anal. Mach. Intell..

[65]  Jefersson Alex dos Santos,et al.  Do deep features generalize from everyday objects to remote sensing and aerial scenes domains? , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[66]  Arnold W. M. Smeulders,et al.  PicToSeek: combining color and shape invariant features for image retrieval , 2000, IEEE Trans. Image Process..

[67]  Michael Isard,et al.  Object retrieval with large vocabularies and fast spatial matching , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[68]  G LoweDavid,et al.  Distinctive Image Features from Scale-Invariant Keypoints , 2004 .

[69]  Brendan J. Frey,et al.  Classifying and segmenting microscopy images with deep multiple instance learning , 2015, Bioinform..

[70]  David A. Clausi,et al.  Weakly Supervised Classification of Remotely Sensed Imagery Using Label Constraint and Edge Penalty , 2017, IEEE Transactions on Geoscience and Remote Sensing.