Influence of Variable Selection and Forest Type on Forest Aboveground Biomass Estimation Using Machine Learning Algorithms

Forest biomass is a major store of carbon and plays a crucial role in the regional and global carbon cycle. Accurate forest biomass assessment is important for monitoring and mapping the status of and changes in forests. However, while remote sensing-based forest biomass estimation in general is well developed and extensively used, improving the accuracy of biomass estimation remains challenging. In this paper, we used China’s National Forest Continuous Inventory data and Landsat 8 Operational Land Imager data in combination with three algorithms, either the linear regression (LR), random forest (RF), or extreme gradient boosting (XGBoost), to establish biomass estimation models based on forest type. In the modeling process, two methods of variable selection, e.g., stepwise regression and variable importance-base method, were used to select optimal variable subsets for LR and machine learning algorithms (e.g., RF and XGBoost), respectively. Comfortingly, the accuracy of models was significantly improved, and thus the following conclusions were drawn: (1) Variable selection is very important for improving the performance of models, especially for machine learning algorithms, and the influence of variable selection on XGBoost is significantly greater than that of RF. (2) Machine learning algorithms have advantages in aboveground biomass (AGB) estimation, and the XGBoost and RF models significantly improved the estimation accuracy compared with the LR models. Despite that the problems of overestimation and underestimation were not fully eliminated, the XGBoost algorithm worked well and reduced these problems to a certain extent. (3) The approach of AGB modeling based on forest type is a very advantageous method for improving the performance at the lower and higher values of AGB. Some conclusions in this paper were probably different as the study area changed. The methods used in this paper provide an optional and useful approach for improving the accuracy of AGB estimation based on remote sensing data, and the estimation of AGB was a reference basis for monitoring the forest ecosystem of the study area.

[1]  Eric A. Lehmann,et al.  Forest cover trends from time series Landsat data for the Australian continent , 2013, Int. J. Appl. Earth Obs. Geoinformation.

[2]  John L. Dwyer,et al.  Landsat: building a strong future , 2012 .

[3]  R. Nelson,et al.  Estimating Siberian timber volume using MODIS and ICESat/GLAS. , 2009 .

[4]  Jocelyn Chanussot,et al.  Support Vector Regression for the Estimation of Forest Stand Parameters Using Airborne Laser Scanning , 2011, IEEE Geoscience and Remote Sensing Letters.

[5]  M. Batistella,et al.  Exploring TM Image Texture and its Relationships with Biomass Estimation in Rondônia, Brazilian Amazon. , 2005 .

[6]  Ramón Díaz-Uriarte,et al.  Gene selection and classification of microarray data using random forest , 2006, BMC Bioinformatics.

[7]  Heather Reese,et al.  Mapping Tree Canopy Cover and Aboveground Biomass in Sudano-Sahelian Woodlands Using Landsat 8 and Random Forest , 2015, Remote. Sens..

[8]  D. Lu Aboveground biomass estimation using Landsat TM data in the Brazilian Amazon , 2005 .

[9]  X. Lei,et al.  Forest Inventory in China: Status and Challenges , 2009 .

[10]  Isabelle Guyon,et al.  An Introduction to Variable and Feature Selection , 2003, J. Mach. Learn. Res..

[11]  M. Nilsson,et al.  Applications using estimates of forest parameters derived from satellite and forest inventory data , 2002 .

[12]  Mingyang Li,et al.  Improving Forest Aboveground Biomass (AGB) Estimation by Incorporating Crown Density and Using Landsat 8 OLI Images of a Subtropical Forest in Western Hunan in Central China , 2019, Forests.

[13]  S. Rigatti Random Forest. , 2017, Journal of insurance medicine.

[14]  D. Roberts,et al.  Using Imaging Spectroscopy to Study Ecosystem Processes and Properties , 2004 .

[15]  Georgia Papacharalampous,et al.  How to explain and predict the shape parameter of the generalized extreme value distribution of streamflow extremes using a big dataset , 2018, Journal of Hydrology.

[16]  Kenton Lee,et al.  The Spectral Response of the Landsat-8 Operational Land Imager , 2014, Remote. Sens..

[17]  Sandra A. Brown Measuring carbon in forests: current status and future challenges. , 2002, Environmental pollution.

[18]  Mingze Li,et al.  Forest type identification by random forest classification combined with SPOT and multitemporal SAR data , 2017, Journal of Forestry Research.

[19]  A. Gitelson,et al.  Vegetation and soil lines in visible spectral space: A concept and technique for remote estimation of vegetation fraction , 2002 .

[20]  Hua Sun,et al.  Mapping Forest Ecosystem Biomass Density for Xiangjiang River Basin by Combining Plot and Remote Sensing Data and Comparing Spatial Extrapolation Methods , 2017, Remote. Sens..

[21]  M. D. Nelson,et al.  Mapping U.S. forest biomass using nationwide forest inventory data and moderate resolution information , 2008 .

[22]  Joanne C. White,et al.  Monitoring Canada’s forests. Part 1: Completion of the EOSD land cover project , 2008 .

[23]  Xindong Wu,et al.  Towards Scalable and Accurate Online Feature Selection for Big Data , 2014, 2014 IEEE International Conference on Data Mining.

[24]  Erkki Tomppo,et al.  The national forest inventory in China: history - results - international context , 2015, Forest Ecosystems.

[25]  Lijuan Liu,et al.  A survey of remote sensing-based aboveground biomass estimation methods in forest ecosystems , 2016, Int. J. Digit. Earth.

[26]  Joanne C. White,et al.  Forest Monitoring Using Landsat Time Series Data: A Review , 2014 .

[27]  Martha C. Anderson,et al.  Landsat-8: Science and Product Vision for Terrestrial Global Change Research , 2014 .

[28]  J Elith,et al.  A working guide to boosted regression trees. , 2008, The Journal of animal ecology.

[29]  Sabine Vanhuysse,et al.  Very High Resolution Object-Based Land Use–Land Cover Urban Classification Using Extreme Gradient Boosting , 2018, IEEE Geoscience and Remote Sensing Letters.

[30]  G. Moisen,et al.  Random forests and stochastic gradient boosting for predicting tree canopy cover: Comparing tuning processes and model performance , 2016 .

[31]  Huan Liu,et al.  Feature Selection for Classification , 1997, Intell. Data Anal..

[32]  Robert M. Haralick,et al.  Textural Features for Image Classification , 1973, IEEE Trans. Syst. Man Cybern..

[33]  C. Peng,et al.  Changes in Forest Biomass Carbon Storage in China Between 1949 and 1998 , 2001, Science.

[34]  Donald L. Grebner,et al.  Consequences of Landsat Image Strata Classification Errors on Bias and Variance of Inventory Estimates: A Forest Inventory Case Study , 2017, IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing.

[35]  Lijuan Liu,et al.  Comparative Analysis of Modeling Algorithms for Forest Aboveground Biomass Estimation in a Subtropical Region , 2018, Remote. Sens..

[36]  Arief Wijaya,et al.  An integrated pan‐tropical biomass map using multiple reference datasets , 2016, Global change biology.

[37]  M. Batistella,et al.  Satellite estimation of aboveground biomass and impacts of forest stand structure , 2005 .

[38]  Claudia Notarnicola,et al.  Review of Machine Learning Approaches for Biomass and Soil Moisture Retrievals from Remote Sensing Data , 2015, Remote. Sens..

[39]  Andy Liaw,et al.  Extreme Gradient Boosting as a Method for Quantitative Structure-Activity Relationships , 2016, J. Chem. Inf. Model..

[40]  Francisco José Climent Diranzo,et al.  Predicting failure in the U.S. banking sector: An extreme gradient boosting approach , 2019, International Review of Economics & Finance.

[41]  R. Houghton,et al.  Aboveground Forest Biomass and the Global Carbon Balance , 2005 .

[42]  Thomas G. Dietterich An Experimental Comparison of Three Methods for Constructing Ensembles of Decision Trees: Bagging, Boosting, and Randomization , 2000, Machine Learning.

[43]  Chengquan Huang,et al.  Annual forest aboveground biomass changes mapped using ICESat/GLAS measurements, historical inventory data, and time-series optical and radar imagery for Guangdong province, China , 2018, Agricultural and Forest Meteorology.

[44]  Tianqi Chen,et al.  XGBoost: A Scalable Tree Boosting System , 2016, KDD.

[45]  A. Gitelson,et al.  Novel algorithms for remote estimation of vegetation fraction , 2002 .

[46]  P. Teillet,et al.  On the Slope-Aspect Correction of Multispectral Scanner Data , 1982 .

[47]  S. Goetz,et al.  Reply to Comment on ‘A first map of tropical Africa’s above-ground biomass derived from satellite imagery’ , 2008, Environmental Research Letters.

[48]  Na Yin,et al.  Estimating Forest Aboveground Biomass by Combining ALOS PALSAR and WorldView-2 Data: A Case Study at Purple Mountain National Park, Nanjing, China , 2014, Remote. Sens..

[49]  Zhe Zhu,et al.  Current status of Landsat program, science, and applications , 2019, Remote Sensing of Environment.

[50]  Toshinori Kojima,et al.  Stand biomass estimation method by canopy coverage for application to remote sensing in an arid area of Western Australia , 2006 .

[51]  Jennifer L. Dungan,et al.  Forest variable estimation from fusion of SAR and multispectral optical data , 2002, IEEE Trans. Geosci. Remote. Sens..

[52]  J. Friedman Stochastic gradient boosting , 2002 .

[53]  Shuai Zhang,et al.  A novel ensemble method for credit scoring: Adaption of different imbalance ratios , 2018, Expert Syst. Appl..

[54]  Limin Dai,et al.  Application of China’s National Forest Continuous Inventory Database , 2011, Environmental management.

[55]  R. Richter Correction of atmospheric and topographic effects for high spatial resolution satellite imagery , 1997 .

[56]  Y. Ouma,et al.  Optimization of Second-Order Grey-Level Texture in High-Resolution Imagery for Statistical Estimation of Above-Ground Biomass , 2006 .

[57]  A. Skidmore,et al.  Narrow band vegetation indices overcome the saturation problem in biomass estimation , 2004 .

[58]  Giorgos Mallinis,et al.  Estimating Mediterranean forest parameters using multi seasonal Landsat 8 OLI imagery and an ensemble learning method , 2017 .

[59]  Takuhiko Murakami,et al.  Estimation of stand volumes using the k-nearest neighbors method in Kyushu, Japan , 2008, Journal of Forest Research.

[60]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[61]  Ranga B. Myneni,et al.  Remote sensing estimates of boreal and temperate forest woody biomass: carbon pools, sources, and sinks , 2003 .

[62]  Jason C. Neff,et al.  Estimates of Aboveground Biomass from Texture Analysis of Landsat Imagery , 2014, Remote. Sens..

[63]  S. Goetz,et al.  Importance of biomass in the global carbon cycle , 2009 .

[64]  D. Lu The potential and challenge of remote sensing‐based biomass estimation , 2006 .

[65]  Verónica Bolón-Canedo,et al.  Feature selection for high-dimensional data , 2016, Progress in Artificial Intelligence.

[66]  Jean-Michel Poggi,et al.  Variable selection using random forests , 2010, Pattern Recognit. Lett..

[67]  J. Carreiras,et al.  Understanding the relationship between aboveground biomass and ALOS PALSAR data in the forests of Guinea-Bissau (West Africa) , 2012 .

[68]  G DietterichThomas An Experimental Comparison of Three Methods for Constructing Ensembles of Decision Trees , 2000 .

[69]  Nicholas C. Coops,et al.  Estimation of forest biomass dynamics in subtropical forests using multi-temporal airborne LiDAR data , 2016 .

[70]  J. Irons,et al.  Landsat 8: The plans, the reality, and the legacy , 2016 .

[71]  Joanne C. White,et al.  Integrating Landsat pixel composites and change metrics with lidar plots to predictively map forest structure and aboveground biomass in Saskatchewan, Canada , 2016 .

[72]  S. Gower Patterns and Mechanisms of the Forest Carbon Cycle1 , 2003 .