Spatial-Spectral Transformer for Hyperspectral Image Classification

Recently, a great many deep convolutional neural network (CNN)-based methods have been proposed for hyperspectral image (HSI) classification. Although the proposed CNN-based methods have the advantages of spatial feature extraction, they are difficult to handle the sequential data with and CNNs are not good at modeling the long-range dependencies. However, the spectra of HSI are a kind of sequential data, and HSI usually contains hundreds of bands. Therefore, it is difficult for CNNs to handle HSI processing well. On the other hand, the Transformer model, which is based on an attention mechanism, has proved its advantages in processing sequential data. To address the issue of capturing relationships of sequential spectra in HSI in a long distance, in this study, Transformer is investigated for HSI classification. Specifically, in this study, a new classification framework titled spatial-spectral Transformer (SST) is proposed for HSI classification. In the proposed SST, a well-designed CNN is used to extract the spatial features, and a modified Transformer (a Transformer with dense connection, i.e., DenseTransformer) is proposed to capture sequential spectra relationships, and multilayer perceptron is used to finish the final classification task. Furthermore, dynamic feature augmentation, which aims to alleviate the overfitting problem and therefore generalize the model well, is proposed and added to the SST (SST-FA). In addition, to address the issue of limited training samples in HSI classification, transfer learning is combined with SST, and another classification framework titled transferring-SST (T-SST) is proposed. At last, to mitigate the overfitting problem and improve the classification accuracy, label smoothing is introduced for the T-SST-based classification framework (T-SST-L). The proposed SST, SST-FA, T-SST, and T-SST-L are tested on three widely used hyperspectral datasets. The obtained results reveal that the proposed models provide competitive results compared to the state-of-the-art methods, which shows that the concept of Transformer opens a new window for HSI classification.