A comparison of machine learning regression techniques for LiDAR-derived estimation of forest variables

Light Detection and Ranging (LiDAR) is a remote sensor able to extract three-dimensional information. Environmental models in forest areas have been benefited by the use of LiDAR-derived information in the last years. A multiple linear regression (MLR) with previous stepwise feature selection is the most common method in the literature to develop those models. MLR defines the relation between the set of field measurements and the statistics extracted from a LiDAR flight. Machine learning has emerged as a suitable tool to improve classic stepwise MLR results on LiDAR. Unfortunately, few studies have been proposed to compare the quality of the multiple machine learning approaches. This paper presents a comparison between the classic MLR-based methodology and regression techniques in machine learning (neural networks, support vector machines, nearest neighbour, ensembles such as random forests) with special emphasis on regression trees. The selected techniques are applied to real LiDAR data from two areas in the province of Lugo (Galizia, Spain). The results confirm that classic MLR is outperformed by machine learning techniques and concretely, our experiments suggest that Support Vector Regression with Gaussian kernels statistically outperforms the rest of the techniques.

[1]  Joaquim Agostinho Barbosa Tinoco,et al.  Application of data mining techniques in the estimation of the uniaxial compressive strength of jet grouting columns over time , 2011 .

[2]  B. Koch,et al.  Non-parametric prediction and mapping of standing timber volume and biomass in a temperate forest: application of multiple optical/LiDAR-derived predictors , 2010 .

[3]  James J. Little,et al.  A Hybrid Conditional Random Field for Estimating the Underlying Ground Surface From Airborne LiDAR Data , 2009, IEEE Transactions on Geoscience and Remote Sensing.

[4]  Nicholas C. Coops,et al.  Predicting building ages from LiDAR data with random forests for building energy modeling , 2014 .

[5]  Daniel A. Friess,et al.  Mangrove biomass estimation in Southwest Thailand using machine learning , 2013 .

[6]  N. Pfeifer,et al.  Correction of laser scanning intensity data: Data and model-driven approaches , 2007 .

[7]  A. Hudak,et al.  Nearest neighbor imputation of species-level, plot-scale forest structure attributes from LiDAR data , 2008 .

[8]  Antonio Ruiz Cortés,et al.  STATService: Herramienta de análisis estadístico como soporte para la investigación con Metaheurísticas , 2012 .

[9]  Rafael Pino-Mejías,et al.  Predicting the potential habitat of oaks with data mining models and the R system , 2010, Environ. Model. Softw..

[10]  H WittenIan,et al.  The WEKA data mining software , 2009 .

[11]  M. Caley,et al.  Global Patterns and Predictions of Seafloor Biomass Using Random Forests , 2010, PloS one.

[12]  Jian Zhang,et al.  Estimating spatial variation in Alberta forest biomass from a combination of forest inventory and remote sensing data , 2013 .

[13]  Saso Dzeroski,et al.  Estimating vegetation height and canopy cover from remotely sensed data with machine learning , 2010, Ecol. Informatics.

[14]  Sorin C. Popescu,et al.  Mapping surface fuel models using lidar and multispectral data fusion for fire behavior , 2008 .

[15]  Liviu Theodor Ene,et al.  Modelling tree diameter from airborne laser scanning derived variables: A comparison of spatial statistical models , 2010 .

[16]  Yong Pang,et al.  Characterizing forest canopy structure with lidar composite metrics and machine learning , 2011 .

[17]  Jason W. Osbourne,et al.  Four Assumptions of Multiple Regression That Researchers Should Always Test. , 2002 .

[18]  Philip A. Townsend,et al.  A pseudo-waveform technique to assess forest structure using discrete lidar data , 2011 .

[19]  Amir-Masoud Eftekhari-Moghadam,et al.  Combination of classification and regression in decision tree for multi-labeling image annotation and retrieval , 2013, Appl. Soft Comput..

[20]  F. M. Danson,et al.  Estimating biomass carbon stocks for a Mediterranean forest in central Spain using LiDAR height and intensity data , 2010 .

[21]  G. Hay,et al.  A Support Vector Regression Approach to Estimate Forest Biophysical Parameters at the Object Level Using Airborne Lidar Transects and QuickBird Data , 2011 .

[22]  Roberta E. Martin,et al.  A Tale of Two “Forests”: Random Forest Machine Learning Aids Tropical Forest Carbon Mapping , 2014, PloS one.

[23]  Ian H. Witten,et al.  The WEKA data mining software: an update , 2009, SKDD.

[24]  Jungho Im,et al.  Forest Biomass and Carbon Stock Quantification Using Airborne LiDAR Data: A Case Study Over Huntington Wildlife Forest in the Adirondack Park , 2014, IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing.

[25]  Francisco Herrera,et al.  A study on the use of statistical tests for experimentation with neural networks: Analysis of parametric test conditions and non-parametric tests , 2007, Expert Syst. Appl..

[26]  Hailemariam Temesgen,et al.  A Comparison of Selected Parametric and Non-Parametric Imputation Methods for Estimating Forest Biomass and Basal Area , 2014 .

[27]  J. R. Quinlan Learning With Continuous Classes , 1992 .

[28]  G. F. Hughes,et al.  On the mean accuracy of statistical pattern recognizers , 1968, IEEE Trans. Inf. Theory.

[29]  Eduardo González-Ferreiro,et al.  Assessing the attributes of high-density Eucalyptus globulus stands using airborne laser scanner data , 2011 .

[30]  Jungho Im,et al.  Forest biomass estimation from airborne LiDAR data using machine learning approaches , 2012 .

[31]  Eduardo González-Ferreiro,et al.  Estimation of stand variables in Pinus radiata D. Don plantations using different LiDAR pulse densities , 2012 .

[32]  Janez Demsar,et al.  Statistical Comparisons of Classifiers over Multiple Data Sets , 2006, J. Mach. Learn. Res..