Spectral information obtained by hyperspectral sensors enables better characterization, identification and classification of the objects in a scene of interest. Unfortunately, several factors have to be addressed in the classification of hyperspectral data, including the acquisition process, the high dimensionality of spectral samples, and the limited availability of labeled data. Consequently, it is of great importance to design hyperspectral image classification schemes able to deal with the issues of the curse of dimensionality, and simultaneously produce accurate classification results, even from a limited number of training data. To that end, we propose a novel machine learning technique that addresses the hyperspectral image classification problem by employing the state-of-the-art scheme of Convolutional Neural Networks (CNNs). The formal approach introduced in this work exploits the fact that the spatio-spectral information of an input scene can be encoded via CNNs and combined with multi-class classifiers. We apply the proposed method on novel dataset acquired by a snapshot mosaic spectral camera and demonstrate the potential of the proposed approach for accurate classification. Introduction Recent advances in optics and photonics are addressing the demand for designing Hyperspectral Imaging (HSI) systems with higher spatial and spectral resolution, able to revile the physical properties of the objects in a scene of interest [1]. This type of data is crucial for multiple applications, such as remote sensing, precision agriculture, food industry, medical and biological applications [2]. Although hyperspectral imaging systems demonstrate substantial advantages in structure identification, HSI acquisition and processing stages, usually introduce multiple factorial constraints. Slow acquisition time, limited spectral and spatial resolution, and the need for linear motion in the case of traditional spectral imagers, are just a few of the limitations that hyperspectral sensors admit, and must be addressed. Snapshot Spectral imaging addresses that problem by sampling the full spatio-spectral cube during each exposure, though a mapping of pixels to specific spectral bands [13], [14], [15]. High spatial and spectral resolution hyperspectral imaging systems demonstrate significant advantages concerning object recognition and material detection applications, by identifying the subtle differences in spectral signatures of various objects. This discrimination of materials based on their spectral profile, can be considered as a classification task, where groups of hyperpixels are labeled to a particular class based on their reflectance properties, exploiting training examples for modeling each class. State-of-the-art hyperspectral classification approaches are composed of two steps; first, hand-crafted feature descriptors are extracted from the training data and second the computed features are used to train classifiers, such as such Support Vector Machines (SVM) [3]. Feature extraction is a significant process in multiple computer vision tasks, such as object recognition, image segmentation and classification. Traditional approaches consider carefully designed hand-crafted features, such as the Scale Invariant Feature Transform (SIFT) [5], or the Histogram of Oriented Gradients (HoG) [6]. Despite their impressive performance, they are not able to efficient encode the underlying characteristics of higher dimensional image data, while significant human intervention is required during the design. In hyperspectral imagery, various feature extraction techniques have been proposed, including decision boundary feature extraction (DBFE) [7] and Kumar’s et al. scheme [8], based on a combination of highly correlated adjacent spectral bands into fewer features by means of top-down and bottom-up algorithms. Additionally, in Earth monitoring remote sensing applications, characteristic feature descriptors are the Normalized Vegetation Difference Index (NDVI) and the Land Surface Temperature(LST). Nevertheless, it is extremely difficult to discover which features are significant for each hyperspectral classification task, due to the high diversity and heterogeneity of the acquired materials. This motivates the need for efficient feature representations directly extracted from input data through deep representation learning [10], a cutting edge paradigm aiming to learn discriminative and robust representations of the input data for use in higher level tasks. The objective of this work is to propose a novel approach for discriminating between different objects in a scene of interest, by introducing a deep feature learning based classification scheme on snapshot mosaic hyperspectral imagery. The proposed system utilizes the Convolutional Neural Networks (CNN) [9] and several multi-class classifiers, in order to extract high-level representative spatio-spectral features, substantially increasing the performance of the subsequent classification task. Unlike traditional hyperspectral classification techniques that extract complex handcrafted features, the proposed algorithm adheres to a machine learning paradigm, able to work even with a small number of labeled training data. To the best of our knowledge, the proposed scheme is the first deep learning-based technique focused on the classification of the hyperspectral snapshot mosaic imagery. A visual description of the proposed scheme is presented in Figure 1 where we present the block diagram for the classification of snapshot mosaic spectral imagery, using am deep CNN. The rest of the paper is organized as follows. Section 2 presents a brief review of the related work concerning deep learning apFigure 1: Block diagram of the proposed scheme: Our system decomposes the input hypercubes into their distinct spectral bands, and extracts AlexNet-based high level features for each spectral observation. The concatenated feature vectors are given as inputs to multi-class classifiers in order to implement the final prediction. proaches for the classification of hyperspectral data. In Section 3 we outline the key theoretical components of the CNN. Section 4 provides an overview of the generated hyperspectral dataset along with experimental results, while the paper concludes in Section 5. Related Work Deep learning (DL) is a special case of representation learning which aims at learning multiple hierarchical levels of representations, leading to more abstract features that are more beneficial in classification [16]. Recently, DL has been considered for various problems in the remote sensing community, including land cover detection [17], [18], building detection [21], and scene classification [22]. Specifically, the authors in [17] considered the paradigm of Stacked Sparse Autoencoders (SSAE) as a feature extraction mechanism for multi-label classification of hyperspectral images. Another feature learning approach for the problem of multi-label land cover classification was proposed in [18], where the authors utilize single and multiple layer Sparse Autoencoders in order to learn representative features able to facilitate the classification process of MODIS multispectral data. In addition to the Sparse Autoencoders framework, over the past few years, CNNs have been established as an effective class of models for understanding image content, producing significant performance gains in image recognition, segmentation, and detection, among others [19], [20]. However, a major limitation of CNNs is the extensively long periods of training time necessary to effectively optimize the large number of the parameters that are considered. Although, it has been shown that CNNs achieve superior performance on a number of visual recognition tasks, their hard computational requirements have limited their application in a handful of hypespectral feature learning and classification tasks. Recently, the authors in [23] utilize CNN’s for large scale remote sensing image classification, and propose an efficient procedure in order to overcome the problem of inefficient training data. Additionally, in [24] the authors design a CNN able to extract spatio-spectral features for classification purposes. Another class of techniques, solve the classification problem by extracting the principal components of the hyperspectral scenes and incorporating convolutions only at the spatial domain [25], [31]. Feature Learning for Classification The main purpose of this work is to classify an N spectral band image, utilizing both its spatial and spectral dimensions. In order to accomplish this task, we employ a sequence of filters wx,y, of size (m×m), which are convolved with the “hyper-pixels” of the spectral cube, aiming at encoding spatial invariance. To achieve scale invariance, each convolution layer is followed by a pooling layer. These learned features are considered as input to a multi-class SVM classifier, in order to implement the labelling task. In the following section, we present CNNs formulation and how they can be applied in the concept of snapshot mosaic hyperspectral classification. Convolutional Neural Networks While in fully-connected deep neural networks, the activation of each hidden unit is computed by multiplying the entire input by the correspondent weights for each neuron in that layer, in CNNs, the activation of each hidden unit is computed for a small input area. CNNs are composed of convolutional layers which alternate with subsampling (pooling) layers, resulting in a hierarchy of increasingly abstract features, optionally followed by fully connected layers to carry out the final labeling into categories. Typically, the final layer of the CNN produces as many outputs as the number of categories, or a single output for the case of binary labeling. At the convolution layer, the previous layer’s feature maps are first convolved with learnable kernels and then are passed through the activation function to form
[1]
Wei-Yin Loh,et al.
Classification and regression trees
,
2011,
WIREs Data Mining Knowl. Discov..
[2]
Nikolaos Doulamis,et al.
Deep supervised learning for hyperspectral data classification through convolutional neural networks
,
2015,
2015 IEEE International Geoscience and Remote Sensing Symposium (IGARSS).
[3]
David G. Lowe,et al.
Object recognition from local scale-invariant features
,
1999,
Proceedings of the Seventh IEEE International Conference on Computer Vision.
[4]
Carolina Blanch,et al.
A tiny VIS-NIR snapshot multispectral camera
,
2015,
Photonics West - Optoelectronic Materials and Devices.
[5]
Patrice Y. Simard,et al.
Best practices for convolutional neural networks applied to visual document analysis
,
2003,
Seventh International Conference on Document Analysis and Recognition, 2003. Proceedings..
[6]
Johan A. K. Suykens,et al.
Least Squares Support Vector Machine Classifiers
,
1999,
Neural Processing Letters.
[7]
Xiuping Jia,et al.
Deep Feature Extraction and Classification of Hyperspectral Images Based on Convolutional Neural Networks
,
2016,
IEEE Transactions on Geoscience and Remote Sensing.
[8]
Trevor Darrell,et al.
Fully Convolutional Networks for Semantic Segmentation
,
2017,
IEEE Transactions on Pattern Analysis and Machine Intelligence.
[9]
Bill Triggs,et al.
Histograms of oriented gradients for human detection
,
2005,
2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).
[10]
Yanfei Zhong,et al.
Large patch convolutional neural networks for the scene classification of high spatial resolution imagery
,
2016
.
[11]
Philippe Soussan,et al.
A CMOS-compatible, integrated approach to hyper- and multispectral imaging
,
2014,
2014 IEEE International Electron Devices Meeting.
[12]
David A. Landgrebe,et al.
Feature Extraction Based on Decision Boundaries
,
1993,
IEEE Trans. Pattern Anal. Mach. Intell..
[13]
Michalis Zervakis,et al.
Deep learning for multi-label land cover classification
,
2015,
SPIE Remote Sensing.
[14]
Panagiotis Tsakalides,et al.
Deep Feature Learning for Hyperspectral Image Classification and Land Cover Estimation
,
2016
.
[15]
Guigang Zhang,et al.
Deep Learning
,
2016,
Int. J. Semantic Comput..
[16]
Geoffrey E. Hinton,et al.
ImageNet classification with deep convolutional neural networks
,
2012,
Commun. ACM.
[17]
Nitish Srivastava,et al.
Dropout: a simple way to prevent neural networks from overfitting
,
2014,
J. Mach. Learn. Res..
[18]
Michael W. Kudenov,et al.
Review of snapshot spectral imaging technologies
,
2013,
Optics and Precision Engineering.
[19]
Pat Morin,et al.
Output-Sensitive Algorithms for Computing Nearest-Neighbour Decision Boundaries
,
2003,
WADS.
[20]
Joydeep Ghosh,et al.
Best-bases feature extraction algorithms for classification of hyperspectral data
,
2001,
IEEE Trans. Geosci. Remote. Sens..
[21]
Ieee Xplore,et al.
IEEE Transactions on Pattern Analysis and Machine Intelligence Information for Authors
,
2022,
IEEE Transactions on Pattern Analysis and Machine Intelligence.
[22]
Silvio Savarese,et al.
Comparing image classification methods: K-nearest-neighbor and support-vector-machines
,
2012
.
[23]
Yichuan Tang,et al.
Deep Learning using Linear Support Vector Machines
,
2013,
1306.0239.
[24]
Lorenzo Bruzzone,et al.
Classification of hyperspectral remote sensing images with support vector machines
,
2004,
IEEE Transactions on Geoscience and Remote Sensing.
[25]
Pierre Alliez,et al.
Convolutional Neural Networks for Large-Scale Remote-Sensing Image Classification
,
2017,
IEEE Transactions on Geoscience and Remote Sensing.
[26]
Ah Chung Tsoi,et al.
Face recognition: a convolutional neural-network approach
,
1997,
IEEE Trans. Neural Networks.
[27]
Nikos Komodakis,et al.
Building detection in very high resolution multispectral data with deep learning features
,
2015,
2015 IEEE International Geoscience and Remote Sensing Symposium (IGARSS).
[28]
Andy Lambrechts,et al.
A compact snapshot multispectral imager with a monolithically integrated per-pixel filter mosaic
,
2014,
Photonics West - Micro and Nano Fabricated Electromechanical and Optical Components.
[29]
Pascal Vincent,et al.
Representation Learning: A Review and New Perspectives
,
2012,
IEEE Transactions on Pattern Analysis and Machine Intelligence.