A Comparison of Machine Learning Techniques Applied to Landsat-5 TM Spectral Data for Biomass Estimation

Abstract. Machine learning combines inductive and automated techniques for recognizing patterns. These techniques can be used with remote sensing datasets to map aboveground biomass (AGB) with an acceptable degree of accuracy for evaluation and management of forest ecosystems. Unfortunately, statistically rigorous comparisons of machine learning algorithms are scarce. The aim of this study was to compare the performance of the 3 most common nonparametric machine learning techniques reported in the literature, vis., Support Vector Machine (SVM), k-nearest neighbor (kNN) and Random Forest (RF), with that of the parametric multiple linear regression (MLR) for estimating AGB from Landsat-5 Thematic Mapper (TM) spectral reflectance data, texture features derived from the Normalized Difference Vegetation Index (NDVI), and topographical features derived from a digital elevation model (DEM). The results obtained for 99 permanent sites (for calibration/validation of the models) established during the winter of 2011 by systematic sampling in the state of Durango (Mexico), showed that SVM performed best once the parameterization had been optimized. Otherwise, SVM could be outperformed by RF. However, the kNN yielded the best overall results in relation to the goodness-of-fit measures. The findings confirm that nonparametric machine learning algorithms are powerful tools for estimating AGB with datasets derived from sensors with medium spatial resolution. Résumé. L’apprentissage automatique combine des techniques inductives et automatisées pour la reconnaissance des formes. Ces techniques peuvent être utilisées avec des ensembles de données de télédétection pour cartographier la biomasse aérienne « aboveground biomass » (AGB) avec un degré de précision acceptable pour l’évaluation et la gestion des écosystèmes forestiers. Malheureusement, des comparaisons statistiquement rigoureuses des algorithmes d’apprentissage automatique sont rares. Le but de cette étude était de comparer les performances des 3 méthodes d’apprentissage automatique non paramétriques les plus fréquemment rapportées dans la littérature, vis., les machines à vecteurs de support « Support Vector Machine » (SVM), les k plus proches voisins « k-nearest neighbor » (kNN) et les forêts aléatoires « Random Forest » (RF), avec celle de la régression linéaire multiple paramétrique (MLR) pour l’estimation de l’AGB provenant des données de réflectance spectrale de Landsat-5 Thematic Mapper (TM), des caractéristiques de texture dérivées de l’indice de végétation par différence normalisée « Normalized Difference Vegetation Index » (NDVI) et des caractéristiques topographiques dérivées d’un modèle numérique de terrain « digital elevation model » (DEM).Les résultats obtenus pour 99 sites permanents (pour la calibration/validation des modèles) établis au cours de l’hiver 2011 par l’échantillonnage systématique dans l’État de Durango (Mexique), ont montré que les SVM montrent leurs meilleures performances une fois que le paramétrage a été optimisé. Par ailleurs, les SVM pourraient être surpassées par les RF. Cependant, les kNN ont donné les meilleurs résultats globaux par rapport aux mesures d’ajustement. Les résultats confirment que les algorithmes d’apprentissage automatique non paramétriques sont des outils puissants pour l’estimation de l’AGB avec des ensembles de données provenant de capteurs avec une résolution spatiale moyenne.

[1]  A. Günlü,et al.  Estimating aboveground biomass using Landsat TM imagery: A case study of Anatolian Crimean pine forests in Turkey , 2014 .

[2]  S. Shataee,et al.  Forest Attributes Estimation Using Aerial Laser Scanner and TM Data , 2013 .

[3]  Erxue Chen,et al.  Estimating montane forest above-ground biomass in the upper reaches of the Heihe River Basin using Landsat-TM data , 2014 .

[4]  D. Lu The potential and challenge of remote sensing‐based biomass estimation , 2006 .

[5]  O. Mutanga,et al.  Evaluating the utility of the medium-spatial resolution Landsat 8 multispectral sensor in quantifying aboveground biomass in uMgeni catchment, South Africa , 2015 .

[6]  A. Skidmore,et al.  Narrow band vegetation indices overcome the saturation problem in biomass estimation , 2004 .

[7]  Dengsheng Lu,et al.  Relationships between forest stand parameters and Landsat TM spectral responses in the Brazilian Amazon Basin , 2004 .

[8]  Sandra Eckert,et al.  Improved Forest Biomass and Carbon Estimations Using Texture Measures from WorldView-2 Satellite Data , 2012, Remote. Sens..

[9]  R. Valentini,et al.  Above ground biomass estimation in an African tropical forest with lidar and hyperspectral data , 2014 .

[10]  David Saah,et al.  Aboveground Forest Biomass Estimation with Landsat and LiDAR Data and Uncertainty Analysis of the Estimates , 2012 .

[11]  G. Asrar,et al.  Estimating Absorbed Photosynthetic Radiation and Leaf Area Index from Spectral Reflectance in Wheat1 , 1984 .

[12]  Florian Hartig,et al.  Importance of sample size, data type and prediction method for remote sensing-based estimations of aboveground forest biomass , 2014 .

[13]  Yadvinder Malhi,et al.  Forests, carbon and global climate , 2002, Philosophical Transactions of the Royal Society of London. Series A: Mathematical, Physical and Engineering Sciences.

[14]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[15]  R Core Team,et al.  R: A language and environment for statistical computing. , 2014 .

[16]  Francisco Herrera,et al.  Advanced nonparametric tests for multiple comparisons in the design of experiments in computational intelligence and data mining: Experimental analysis of power , 2010, Inf. Sci..

[17]  Andrea Perlis,et al.  Global forest resources assessment 2000 : main report , 2001 .

[18]  Alicia Troncoso Lora,et al.  A comparison of machine learning regression techniques for LiDAR-derived estimation of forest variables , 2015, Neurocomputing.

[19]  Heather McNairn,et al.  Towards operational radar-only crop type classification: comparison of a traditional decision tree with a random forest classifier , 2012 .

[20]  Robert A. Monserud,et al.  An evaluation of diagnostic tests and their roles in validating forest biometric models , 2004 .

[21]  Ryan J. Frazier,et al.  Characterization of aboveground biomass in an unmanaged boreal forest using Landsat temporal segmentation metrics , 2014 .

[22]  Yadvinder Malhi,et al.  Tropical forests and global atmospheric change: a synthesis. , 2004, Philosophical transactions of the Royal Society of London. Series B, Biological sciences.

[23]  Jianping Guo,et al.  Reprint of: Estimation of forest above-ground biomass using multi-parameter remote sensing data over a cold and arid area , 2012, Int. J. Appl. Earth Obs. Geoinformation.

[24]  Pablito M. López-Serrano,et al.  Geospatial Estimation of above Ground Forest Biomass in the Sierra Madre Occidental in the State of Durango, Mexico , 2016 .

[25]  Jason C. Neff,et al.  Estimates of Aboveground Biomass from Texture Analysis of Landsat Imagery , 2014, Remote. Sens..

[26]  Xinkai Zhu,et al.  Estimation of biomass in wheat using random forest regression algorithm and remote sensing data , 2016 .

[27]  Ronald E. McRoberts,et al.  Estimating forest attribute parameters for small areas using nearest neighbors techniques , 2012 .

[28]  Juan Gabriel Álvarez-González,et al.  Estimating biomass of mixed and uneven-aged forests using spectral data and a hybrid model combining regression trees and linear models , 2016 .

[29]  Hans Pretzsch,et al.  Recommendations for Standardized Documentation and Further Development of Forest Growth Simulators , 2002, Forstwissenschaftliches Centralblatt vereinigt mit Tharandter forstliches Jahrbuch.

[30]  W. Cohen,et al.  Using Landsat-derived disturbance and recovery history and lidar to map forest biomass dynamics , 2014 .

[31]  Aniruddha Ghosh,et al.  A framework for mapping tree species combining hyperspectral and LiDAR data: Role of selected classifiers and sensor across three spatial scales , 2014, Int. J. Appl. Earth Obs. Geoinformation.

[32]  Daniel A. Friess,et al.  Mangrove biomass estimation in Southwest Thailand using machine learning , 2013 .

[33]  M. Hutchinson,et al.  Digital terrain analysis. , 2008 .

[34]  I. D. Moore,et al.  Landscape assessment of soil erosion and nonpoint source pollution , 1989 .

[35]  A. Huete,et al.  A Modified Soil Adjusted Vegetation Index , 1994 .

[36]  Matthew J. Cracknell,et al.  Geological mapping using remote sensing data: A comparison of five machine learning algorithms, their response to variations in the spatial distribution of training data and the use of explicit spatial information , 2014, Comput. Geosci..

[37]  Piermaria Corona,et al.  VALUTAZIONE DELLE RISORSE FORESTALI A LIVELLO GLOBALE , 2013 .

[38]  Ke Wang,et al.  Landsat Imagery-Based Above Ground Biomass Estimation and Change Investigation Related to Human Activities , 2016 .

[39]  Lijuan Liu,et al.  A survey of remote sensing-based aboveground biomass estimation methods in forest ecosystems , 2016, Int. J. Digit. Earth.

[40]  E. Næsset,et al.  Improving k-nearest neighbor predictions in forest inventories by combining high and low density airborne laser scanning data , 2012 .

[41]  Xiaolin Zhu,et al.  Improving forest aboveground biomass estimation using seasonal Landsat NDVI time-series , 2015 .

[42]  G. F. Hughes,et al.  On the mean accuracy of statistical pattern recognizers , 1968, IEEE Trans. Inf. Theory.

[43]  Mark A. Hall,et al.  Correlation-based Feature Selection for Machine Learning , 2003 .

[44]  Christian Wehenkel,et al.  Estimating balanced structure areas in multi-species forests on the Sierra Madre Occidental, Mexico , 2011, Annals of Forest Science.

[45]  United Kingdom,et al.  GLOBAL FOREST RESOURCES ASSESSMENT 2005 , 2005 .

[46]  Jaehoon Jung,et al.  Effects of national forest inventory plot location error on forest carbon stock estimation using k-nearest neighbor algorithm , 2013 .

[47]  W. Henry McNab,et al.  Terrain shape index: quantifying effect of minor landforms on tree height , 1989 .

[48]  J. Goldammer Global Forest Resources Assessment 2005 – Thematic report on forest fires in the Central Asian Region and adjacent countries / FAO Fire Management Working Paper 16 , 2006 .

[49]  Claudia Notarnicola,et al.  Review of Machine Learning Approaches for Biomass and Soil Moisture Retrievals from Remote Sensing Data , 2015, Remote. Sens..

[50]  Corinna Cortes,et al.  Support-Vector Networks , 1995, Machine Learning.

[51]  A. Huete A soil-adjusted vegetation index (SAVI) , 1988 .

[52]  Guoqing Sun,et al.  Forest Biomass Mapping of Northeastern China Using GLAS and MODIS Data , 2014, IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing.

[53]  M. Maltamo,et al.  Variable selection strategies for nearest neighbor imputation methods used in remote sensing based forest inventory , 2012 .

[54]  戎兵,et al.  ERDAS IMAGINE——遥感业界的先驱 , 1996 .

[55]  Jianping Guo,et al.  Estimation of forest above-ground biomass using multi-parameter remote sensing data over a cold and arid area , 2012, Int. J. Appl. Earth Obs. Geoinformation.

[56]  F. Samadzadegan,et al.  Simultaneous feature selection and SVM parameter determination in classification of hyperspectral imagery using Ant Colony Optimization , 2012 .

[57]  José Cristóbal Riquelme Santos,et al.  Evolutionary feature selection to estimate forest stand variables using LiDAR , 2014, Int. J. Appl. Earth Obs. Geoinformation.

[58]  Richard Nock,et al.  Structural knowledge learning from maps for supervised land cover/use classification: Application to the monitoring of land cover/use maps in French Guiana , 2015, Comput. Geosci..

[59]  H. Chipman,et al.  A new Bayesian ensemble of trees approach for land cover classification of satellite imagery , 2014 .

[60]  R. Nelson,et al.  Comparison of precision of biomass estimates in regional field sample surveys and airborne LiDAR-assisted surveys in Hedmark County, Norway , 2013 .

[61]  Erxue Chen,et al.  Comparison of estimating forest above-ground biomass over montane area by two non-parametric methods , 2014, 2014 IEEE Geoscience and Remote Sensing Symposium.

[62]  Ghassem R. Asrar,et al.  Theory and applications of optical remote sensing. , 1989 .

[63]  Andreas Huth,et al.  Connecting dynamic vegetation models to data – an inverse perspective , 2012 .

[64]  Benyang Tang,et al.  Spacebased Estimation of Moisture Transport in Marine Atmosphere Using Support Vector Regression , 2008 .

[65]  Peter E. Hart,et al.  Nearest neighbor pattern classification , 1967, IEEE Trans. Inf. Theory.

[66]  H. Andersen,et al.  Using multilevel remote sensing and ground data to estimate forest biomass resources in remote regions: a case study in the boreal forests of interior Alaska , 2011 .

[67]  Cheng-Lung Huang,et al.  A GA-based feature selection and parameters optimizationfor support vector machines , 2006, Expert Syst. Appl..

[68]  Francisco Herrera,et al.  A study on the use of statistical tests for experimentation with neural networks: Analysis of parametric test conditions and non-parametric tests , 2007, Expert Syst. Appl..

[69]  Annette M. Molinaro,et al.  Prediction error estimation: a comparison of resampling methods , 2005, Bioinform..

[70]  Robert K. Colwell,et al.  Species Loss and Aboveground Carbon Storage in a Tropical Forest , 2005, Science.

[71]  Robert M. Haralick,et al.  Textural Features for Image Classification , 1973, IEEE Trans. Syst. Man Cybern..

[72]  A. Mather,et al.  Global Forest Resources Assessment 2000 Main Report: FAO Forestry Paper 140, FAO, Rome, 2001, xxvii+479pp, price $40.00, ISBN 92 5 104642-5, ISSN 0258-6150 , 2003 .

[73]  Lorenzo Bruzzone,et al.  Kernel-based methods for hyperspectral image classification , 2005, IEEE Transactions on Geoscience and Remote Sensing.

[74]  Ian H. Witten,et al.  The WEKA data mining software: an update , 2009, SKDD.

[75]  Abbas Bahroudi,et al.  Support vector machine for multi-classification of mineral prospectivity areas , 2012, Comput. Geosci..

[76]  Yong Pang,et al.  Characterizing forest canopy structure with lidar composite metrics and machine learning , 2011 .

[77]  B. Koch,et al.  Non-parametric prediction and mapping of standing timber volume and biomass in a temperate forest: application of multiple optical/LiDAR-derived predictors , 2010 .

[78]  Jean-Louis Fellous,et al.  Global Climate Observing System , 2014, Encyclopedia of Remote Sensing.

[79]  Onisimo Mutanga,et al.  High density biomass estimation for wetland vegetation using WorldView-2 imagery and random forest regression algorithm , 2012, Int. J. Appl. Earth Obs. Geoinformation.

[80]  Jungho Im,et al.  Forest biomass estimation from airborne LiDAR data using machine learning approaches , 2012 .

[81]  J. Mas,et al.  Estimating the spatial distribution of woody biomass suitable for charcoal making from remote sensing and geostatistics in central Mexico , 2013 .

[82]  W. Brutsaert On a derivable formula for long-wave radiation from clear skies , 1975 .

[83]  José Cristóbal Riquelme Santos,et al.  On the evolutionary optimization of k-NN by label-dependent feature weighting , 2012, Pattern Recognit. Lett..

[84]  F. Baret,et al.  Potentials and limits of vegetation indices for LAI and APAR assessment , 1991 .

[85]  Ramón A. Díaz-Varela,et al.  Evaluation of Radiometric and Atmospheric Correction Algorithms for Aboveground Forest Biomass Estimation Using Landsat 5 TM Data , 2016, Remote. Sens..