A CNN-Transformer Hybrid Approach for Crop Classification Using Multitemporal Multisensor Images

Multitemporal Earth observation capability plays an increasingly important role in crop monitoring. As the frequency of satellite acquisition of remote sensing images becomes higher, how to fully exploit the implicit phenological laws in dense multitemporal data is of increasing importance. In this article, we propose a CNN-transformer approach to perform the crop classification, in the model, we borrow the transformer architecture from the knowledge of NLP to dig into the pattern of multitemporal sequence. First, after unifying the spatial-spectral scale of each multiband data acquired from different sensors, we obtain the scale-consistent feature and position feature of multitemporal sequence. Second, with adopting multilayer encoder modules derived from the transformer, we mine deep correlation patterns of multitemporal sequence. Finally, the feed-forward layer and softmax layer serve as output layers of the model to predict crop categories. The proposed CNN-transformer approach is illustrated in a crop-rich agricultural region in central California, where 65 multitemporal profiles from multisensor Sentinel-2 A/B and Landsat-8 are obtained in 2018. Through multiband multiresolution fusion, sequence correlation extraction of multitemporal data and category feature extraction, the classification results show that the proposed method has a significant performance improvement compared with other traditional methods.

[1]  D. V. van Essen,et al.  A neurobiological model of visual attention and invariant pattern recognition based on dynamic routing of information , 1993, The Journal of neuroscience : the official journal of the Society for Neuroscience.

[2]  M. J. Pringle,et al.  SUPPORT VECTOR MACHINE CLASSIFICATION OF OBJECT-BASED DATA FOR CROP MAPPING, USING MULTI-TEMPORAL LANDSAT IMAGERY , 2012 .

[3]  Tat-Seng Chua,et al.  SCA-CNN: Spatial and Channel-Wise Attention in Convolutional Networks for Image Captioning , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[4]  Chao Zeng,et al.  Missing Data Reconstruction in Remote Sensing Image With a Unified Spatial–Temporal–Spectral Deep Convolutional Neural Network , 2018, IEEE Transactions on Geoscience and Remote Sensing.

[5]  Alex Graves,et al.  Recurrent Models of Visual Attention , 2014, NIPS.

[6]  Jason Weston,et al.  A Neural Attention Model for Abstractive Sentence Summarization , 2015, EMNLP.

[7]  Russell G. Congalton,et al.  Nominal 30-m Cropland Extent Map of Continental Africa by Integrating Pixel-Based and Object-Based Algorithms Using Sentinel-2 and Landsat-8 Data on Google Earth Engine , 2017, Remote. Sens..

[8]  C. Koch,et al.  Computational modelling of visual attention , 2001, Nature Reviews Neuroscience.

[9]  Alexandre Bouvet,et al.  Understanding the temporal behavior of crops using Sentinel-1 and Sentinel-2-like data for agricultural applications , 2017 .

[10]  Diyi Yang,et al.  Hierarchical Attention Networks for Document Classification , 2016, NAACL.

[11]  Sergey V. Skakun,et al.  Reconstruction of Missing Data in Time-Series of Optical Satellite Images Using Self-Organizing Kohonen Maps , 2014 .

[12]  Christof Koch,et al.  A Model of Saliency-Based Visual Attention for Rapid Scene Analysis , 2009 .

[13]  Peter M. Atkinson,et al.  Fusion of Landsat 8 OLI and Sentinel-2 MSI Data , 2017, IEEE Transactions on Geoscience and Remote Sensing.

[14]  Curtis E. Woodcock,et al.  Monitoring agricultural lands in Egypt with multitemporal Landsat TM imagery: How many images are needed? , 1997 .

[15]  Nataliia Kussul,et al.  Deep Learning Classification of Land Cover and Crop Types Using Remote Sensing Data , 2017, IEEE Geoscience and Remote Sensing Letters.

[16]  Bo Du,et al.  Deep Learning for Remote Sensing Data: A Technical Tutorial on the State of the Art , 2016, IEEE Geoscience and Remote Sensing Magazine.

[17]  Wenpeng Yin,et al.  Attention-Based Convolutional Neural Network for Machine Comprehension , 2016, ArXiv.

[18]  Heather McNairn,et al.  Estimating canola phenology using synthetic aperture radar , 2018, Remote Sensing of Environment.

[19]  Ivan Laptev,et al.  Learnable pooling with Context Gating for video classification , 2017, ArXiv.

[20]  Gang Wang,et al.  Deep Learning-Based Classification of Hyperspectral Data , 2014, IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing.

[21]  Yoshua Bengio,et al.  Show, Attend and Tell: Neural Image Caption Generation with Visual Attention , 2015, ICML.

[22]  Théodore Bluche,et al.  Joint Line Segmentation and Transcription for End-to-End Handwritten Paragraph Recognition , 2016, NIPS.

[23]  C. Woodcock,et al.  Improvement and expansion of the Fmask algorithm: cloud, cloud shadow, and snow detection for Landsats 4–7, 8, and Sentinel 2 images , 2015 .

[24]  Alec Radford,et al.  Improving Language Understanding by Generative Pre-Training , 2018 .

[25]  Farid Melgani Classification Of Multitemporal Remote-Sensing Images By A Fuzzy Fusion Of Spectral And Spatio-Temporal Contextual Information , 2004, Int. J. Pattern Recognit. Artif. Intell..

[26]  Thomas Esch,et al.  Pan-European Grassland Mapping Using Seasonal Statistics From Multisensor Image Time Series , 2014, IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing.

[27]  Shengbo Chen,et al.  Comparative Performance Evaluation of Pixel-Level and Decision-Level Data Fusion of Landsat 8 OLI, Landsat 7 ETM+ and Sentinel-2 MSI for Crop Ensemble Classification , 2018, IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing.

[28]  Pierre Goovaerts,et al.  Use of land surface remotely sensed satellite and airborne data for environmental exposure assessment in cancer research , 2010, Journal of Exposure Science and Environmental Epidemiology.

[29]  Christopher D. Manning,et al.  Effective Approaches to Attention-based Neural Machine Translation , 2015, EMNLP.

[30]  Naoto Yokoya,et al.  Open Data for Global Multimodal Land Use Classification: Outcome of the 2017 IEEE GRSS Data Fusion Contest , 2018, IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing.

[31]  Jie Shan,et al.  Flood mapping with satellite images and its web service. , 2010 .

[32]  Matthew C. Hansen,et al.  National-scale soybean mapping and area estimation in the United States using medium resolution satellite imagery and field survey , 2017 .

[33]  Lukasz Kaiser,et al.  Attention is All you Need , 2017, NIPS.

[34]  Avik Bhattacharya,et al.  Sen4Rice: A Processing Chain for Differentiating Early and Late Transplanted Rice Using Time-Series Sentinel-1 SAR Data With Google Earth Engine , 2018, IEEE Geoscience and Remote Sensing Letters.

[35]  Martha C. Anderson,et al.  Landsat-8: Science and Product Vision for Terrestrial Global Change Research , 2014 .

[36]  Yoshua Bengio,et al.  Neural Machine Translation by Jointly Learning to Align and Translate , 2014, ICLR.

[37]  Wang Jihua,et al.  Crop classification recognition based on time-series images from HJ satellite , 2013 .