Exploring the potential role of feature selection in global land-cover mapping

ABSTRACT Global land cover has been acknowledged as a fundamental variable in several global-scale studies for environment and climate change. Recent developments in global land-cover mapping focused on spatial resolution improvement with more heterogeneous features to integrate the spatial, spectral, and temporal information. Although the high dimensional input features as a whole lead to discriminatory strengths to produce more accurate land-cover maps, it comes at the cost of an increased classification complexity. The feature selection method has become a necessity for dimensionality reduction in classification with large amounts of input features. In this study, the potential of feature selection in global land-cover mapping is explored. A total of 63 features derived from the Landsat Thematic Mapper (TM) spectral bands, Moderate Resolution Imaging Spectroradiometer (MODIS) time series enhanced vegetation index (EVI) data, digital elevation model (DEM), and many climate-ecological variables and global training samples are input to k-nearest neighbours (k-NN) and Random Forest (RF) classifiers. Two filter feature selection algorithms, i.e. Relieff and max-min-associated (MNA), were employed to select the optimal subsets of features for the whole world and different biomes. The mapping accuracies with/without feature selection were evaluated by a global validation sample set. Overall, the result indicates no significant accuracy improvement in global land-cover mapping after dimensionality reduction. Nevertheless, feature selection has the capability of identifying useful features in different biomes and improves the computational efficiency, which is valuable in global-scale computing.

[1]  J. L. Parra,et al.  Very high resolution interpolated climate surfaces for global land areas , 2005 .

[2]  Aaron Moody,et al.  Photosynthetic activity of US biomes: responses to the spatial variability and seasonality of precipitation and temperature , 2004 .

[3]  Li Wang,et al.  Feature Selection of Time Series MODIS Data for Early Crop Classification Using Random Forest: A Case Study in Kansas, USA , 2015, Remote. Sens..

[4]  Giles M. Foody,et al.  Feature Selection for Classification of Hyperspectral Data by SVM , 2010, IEEE Transactions on Geoscience and Remote Sensing.

[5]  Antonio Trabucco,et al.  Climate change mitigation: a spatial analysis of global land suitability for Clean Development Mechanism afforestation and reforestation , 2008 .

[6]  Le Yu,et al.  Global-Scale Associations of Vegetation Phenology with Rainfall and Temperature at a High Spatio-Temporal Resolution , 2014, Remote. Sens..

[7]  Graeme G. Wilkinson,et al.  Results and implications of a study of fifteen years of satellite image classification experiments , 2005, IEEE Transactions on Geoscience and Remote Sensing.

[8]  Pedro Larrañaga,et al.  A review of feature selection techniques in bioinformatics , 2007, Bioinform..

[9]  S. Rigatti Random Forest. , 2017, Journal of insurance medicine.

[10]  Shilong Piao,et al.  NDVI-based increase in growth of temperate grasslands and its responses to climate changes in China , 2006 .

[11]  Terry L Sohl,et al.  Using an Ecoregion Framework to Analyze Land-Cover and Land-Use Dynamics , 2004, Environmental management.

[12]  Le Yu,et al.  Improving 30 m global land-cover map FROM-GLC with time series MODIS and auxiliary data sets: a segmentation-based approach , 2013 .

[13]  M. Friedl,et al.  Mapping global urban areas using MODIS 500-m data: new methods and datasets based on 'urban ecoregions'. , 2010 .

[14]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[15]  J. L. Hodges,et al.  Discriminatory Analysis - Nonparametric Discrimination: Consistency Properties , 1989 .

[16]  K. Price,et al.  Temporal responses of NDVI to precipitation and temperature in the central Great Plains, USA , 2003 .

[17]  K. Sahr,et al.  Geodesic Discrete Global Grid Systems , 2003 .

[18]  Hankui K. Zhang,et al.  Meta-discoveries from a synthesis of satellite-based land-cover mapping research , 2014 .

[19]  G. F. Hughes,et al.  On the mean accuracy of statistical pattern recognizers , 1968, IEEE Trans. Inf. Theory.

[20]  Ranga B. Myneni,et al.  Remote sensing of vegetation and land-cover change in Arctic Tundra Ecosystems , 2004 .

[21]  G. Powell,et al.  Terrestrial Ecoregions of the World: A New Map of Life on Earth , 2001 .

[22]  Antonio Trabucco,et al.  Trees and water: smallholder agroforestry on irrigated lands in Northern India , 2007 .

[23]  Guoqing Sun,et al.  Hierarchical mapping of Northern Eurasian land cover using MODIS data , 2011 .

[24]  T. Sohl,et al.  Using the FORE-SCE model to project land-cover change in the southeastern United States , 2008 .

[25]  Qihao Weng,et al.  A survey of image classification methods and techniques for improving classification performance , 2007 .

[26]  Martin Herold,et al.  Some challenges in global land cover mapping : An assessment of agreement and accuracy in existing 1 km datasets , 2008 .

[27]  Le Yu,et al.  Towards a common validation sample set for global land-cover mapping , 2014 .

[28]  Liangpei Zhang,et al.  Feature Selection via Cramer's V-Test Discretization for Remote-Sensing Image Classification , 2014, IEEE Transactions on Geoscience and Remote Sensing.

[29]  G. Foody Thematic map comparison: Evaluating the statistical significance of differences in classification accuracy , 2004 .

[30]  Sean R Eddy,et al.  What is dynamic programming? , 2004, Nature Biotechnology.

[31]  Achim Zeileis,et al.  Conditional variable importance for random forests , 2008, BMC Bioinformatics.

[32]  Marko Robnik-Sikonja,et al.  Overcoming the Myopia of Inductive Learning Algorithms with RELIEFF , 2004, Applied Intelligence.

[33]  Hankui K. Zhang,et al.  Finer resolution observation and monitoring of global land cover: first mapping results with Landsat TM and ETM+ data , 2013 .

[34]  Christopher Conrad,et al.  Impact of feature selection on the accuracy and spatial uncertainty of per-field crop classification using Support Vector Machines , 2013 .

[35]  Obi Reddy P. Gangalakunta,et al.  Global irrigated area map (GIAM), derived from remote sensing, for the end of the last millennium , 2009 .

[36]  Le Yu,et al.  Land cover mapping and data availability in critical terrestrial ecoregions: A global perspective with Landsat thematic mapper and enhanced thematic mapper plus data , 2015 .

[37]  C. Braak,et al.  Non-linear methods for multivariate statistical calibration and their use in palaeoecology: a comparison of inverse (k-nearest neighbours, partial least squares and weighted averaging partial least squares) and classical approaches , 1995 .

[38]  Huanfeng Shen,et al.  Feature selection based on max–min-associated indices for classification of remotely sensed imagery , 2012 .