Balanced VS Imbalanced Training Data: Classifying Rapideye Data with Support Vector Machines

The accuracy of supervised image classification is highly dependent upon several factors such as the design of training set (sample selection, composition, purity and size), resolution of input imagery and landscape heterogeneity. The design of training set is still a challenging issue since the sensitivity of classifier algorithm at learning stage is different for the same dataset. In this paper, the classification of RapidEye imagery with balanced and imbalanced training data for mapping the crop types was addressed. Classification with imbalanced training data may result in low accuracy in some scenarios. Support Vector Machines (SVM), Maximum Likelihood (ML) and Artificial Neural Network (ANN) classifications were implemented here to classify the data. For evaluating the influence of the balanced and imbalanced training data on image classification algorithms, three different training datasets were created. Two different balanced datasets which have 70 and 100 pixels for each class of interest and one imbalanced dataset in which each class has different number of pixels were used in classification stage. Results demonstrate that ML and NN classifications are affected by imbalanced training data in resulting a reduction in accuracy (from 90.94% to 85.94% for ML and from 91.56% to 88.44% for NN) while SVM is not affected significantly (from 94.38% to 94.69%) and slightly improved. Our results highlighted that SVM is proven to be a very robust, consistent and effective classifier as it can perform very well under balanced and imbalanced training data situations. Furthermore, the training stage should be precisely and carefully designed for the need of adopted classifier.

[1]  Thinagaran Perumal,et al.  A new classification model for a class imbalanced data set using genetic programming and support vector machines: case study for wilt disease classification , 2015 .

[2]  B. Kleinschmit,et al.  The benefit of synthetically generated RapidEye and Landsat 8 data fusion time series for riparian forest disturbance monitoring , 2016 .

[3]  Lorenzo Bruzzone,et al.  Classification of hyperspectral remote sensing images with support vector machines , 2004, IEEE Transactions on Geoscience and Remote Sensing.

[4]  Begüm Demir,et al.  Hyperspectral Image Classification Using Relevance Vector Machines , 2007, IEEE Geoscience and Remote Sensing Letters.

[5]  Timothy A. Warner,et al.  Kernel-based extreme learning machine for remote-sensing image classification , 2013 .

[6]  J. Townshend,et al.  NDVI-derived land cover classifications at a global scale , 1994 .

[7]  Giles M. Foody,et al.  Training set size requirements for the classification of a specific class , 2006 .

[8]  Ribana Roscher,et al.  I2VM: Incremental import vector machines , 2012, Image Vis. Comput..

[9]  Richard A. Wadsworth,et al.  What is Land Cover? , 2005 .

[10]  Taskin Kavzoglu,et al.  A kernel functions analysis for support vector machines for land cover classification , 2009, Int. J. Appl. Earth Obs. Geoinformation.

[11]  Mahesh Pal Extreme‐learning‐machine‐based land cover classification , 2008, ArXiv.

[12]  Clement Atzberger,et al.  First Experience with Sentinel-2 Data for Crop and Tree Species Classifications in Central Europe , 2016, Remote. Sens..

[13]  Paul M. Mather,et al.  The use of backpropagating artificial neural networks in land cover classification , 2003 .

[14]  Christopher Conrad,et al.  Integration of Optical and Synthetic Aperture Radar Imagery for Improving Crop Mapping in Northwestern Benin, West Africa , 2014, Remote. Sens..

[15]  O. Mutanga,et al.  Evaluating the impact of red-edge band from Rapideye image for classifying insect defoliation levels , 2014 .

[16]  Qihao Weng,et al.  A survey of image classification methods and techniques for improving classification performance , 2007 .

[17]  Giorgos Mountrakis,et al.  A meta-analysis of remote sensing research on supervised pixel-based land-cover image classification processes: General guidelines for practitioners and future research , 2016 .

[18]  Jong-Min Yeom,et al.  Sensitivity of vegetation indices to spatial degradation of RapidEye imagery for paddy rice detection: a case study of South Korea , 2015 .

[19]  Johannes R. Sveinsson,et al.  Classifying Remote Sensing Data with Support Vector Machines and Imbalanced Training Data , 2009, MCS.

[20]  Galal Omer,et al.  Performance of Support Vector Machines and Artificial Neural Network for Mapping Endangered Tree Species Using WorldView-2 Data in Dukuduku Forest, South Africa , 2015, IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing.

[21]  Zhongxin Chen,et al.  Advances in research on crop identification using SAR , 2015, 2015 Fourth International Conference on Agro-Geoinformatics (Agro-geoinformatics).

[22]  P. Swain,et al.  Neural Network Approaches Versus Statistical Methods In Classification Of Multisource Remote Sensing Data , 1990 .

[23]  Taskin Kavzoglu,et al.  Increasing the accuracy of neural network classification using refined training data , 2009, Environ. Model. Softw..

[24]  B. Kleinschmit,et al.  Testing the red edge channel for improving land-use classifications based on high-resolution multi-spectral satellite data , 2012 .

[25]  Paul M. Mather,et al.  Computer processing of remotely-sensed images , 2016 .

[26]  M. Trebar,et al.  Application of distributed SVM architectures in classifying forest data cover types , 2008 .