Evolutionary feature selection to estimate forest stand variables using LiDAR

Abstract Light detection and ranging (LiDAR) has become an important tool in forestry. LiDAR-derived models are mostly developed by means of multiple linear regression (MLR) after stepwise selection of predictors. An increasing interest in machine learning and evolutionary computation has recently arisen to improve regression use in LiDAR data processing. Although evolutionary machine learning has already proven to be suitable for regression, evolutionary computation may also be applied to improve parametric models such as MLR. This paper provides a hybrid approach based on joint use of MLR and a novel genetic algorithm for the estimation of the main forest stand variables. We show a comparison between our genetic approach and other common methods of selecting predictors. The results obtained from several LiDAR datasets with different pulse densities in two areas of the Iberian Peninsula indicate that genetic algorithms perform better than the other methods statistically. Preliminary studies suggest that a lack of parametric conditions in field data and possible misuse of parametric tests may be the main reasons for the better performance of the genetic algorithm. This research confirms the findings of previous studies that outline the importance of evolutionary computation in the context of LiDAR analisys of forest data, especially when the size of fieldwork datatasets is reduced.

[1]  Clara Tattoni,et al.  Can LiDAR data improve bird habitat suitability models , 2012 .

[2]  R. Gonzalez Applied Multivariate Statistics for the Social Sciences , 2003 .

[3]  A. Kozak,et al.  Does cross validation provide additional information in the evaluation of regression models , 2003 .

[4]  Lin Li,et al.  Hyperspectral retrieval of phycocyanin in potable water sources using genetic algorithm-partial least squares (GA-PLS) modeling , 2012, Int. J. Appl. Earth Obs. Geoinformation.

[5]  David E. Goldberg,et al.  Genetic Algorithms in Search Optimization and Machine Learning , 1988 .

[6]  Yong Pang,et al.  Characterizing forest canopy structure with lidar composite metrics and machine learning , 2011 .

[7]  Philip K. Hopke,et al.  Variable selection in classification of environmental soil samples for partial least square and neural network models , 2001 .

[8]  A. W. van der Vaart,et al.  Uniform Central Limit Theorems , 2001 .

[9]  Richard A. Fournier,et al.  An architectural model of trees to estimate forest structural attributes using terrestrial LiDAR , 2011, Environ. Model. Softw..

[10]  G. F. Hughes,et al.  On the mean accuracy of statistical pattern recognizers , 1968, IEEE Trans. Inf. Theory.

[11]  Eduardo González-Ferreiro,et al.  Assessing the attributes of high-density Eucalyptus globulus stands using airborne laser scanner data , 2011 .

[12]  H. Levene Robust tests for equality of variances , 1961 .

[13]  F. M. Danson,et al.  Multispectral and LiDAR data fusion for fuel type mapping using Support Vector Machine and decision rules , 2011 .

[14]  S. Shapiro,et al.  An Analysis of Variance Test for Normality (Complete Samples) , 1965 .

[15]  S. G. Tesfamichael,et al.  Investigating the impact of discrete-return lidar point density on estimations of mean and dominant plot-level tree height in Eucalyptus grandis plantations , 2010 .

[16]  Pawel Lewicki,et al.  Statistics : methods and applications : a comprehensive reference for science, industry, and data mining , 2006 .

[17]  B. Koch,et al.  Non-parametric prediction and mapping of standing timber volume and biomass in a temperate forest: application of multiple optical/LiDAR-derived predictors , 2010 .

[18]  R. Fildes Conditioning Diagnostics: Collinearity and Weak Data in Regression , 1993 .

[19]  G. Schwarz Estimating the Dimension of a Model , 1978 .

[20]  Gang Chen,et al.  Article in Press G Model International Journal of Applied Earth Observation and Geoinformation a Geobia Framework to Estimate Forest Parameters from Lidar Transects, Quickbird Imagery and Machine Learning: a Case Study in Quebec, Canada , 2022 .

[21]  F. M. Danson,et al.  Estimating biomass carbon stocks for a Mediterranean forest in central Spain using LiDAR height and intensity data , 2010 .

[22]  Jungho Im,et al.  Forest biomass estimation from airborne LiDAR data using machine learning approaches , 2012 .

[23]  H. Lilliefors On the Kolmogorov-Smirnov Test for Normality with Mean and Variance Unknown , 1967 .

[24]  Eduardo González-Ferreiro,et al.  Estimation of stand variables in Pinus radiata D. Don plantations using different LiDAR pulse densities , 2012 .

[25]  Janez Demsar,et al.  Statistical Comparisons of Classifiers over Multiple Data Sets , 2006, J. Mach. Learn. Res..

[26]  John B. Bradford,et al.  Hierarchical Bayesian spatial models for predicting multiple forest variables using waveform LiDAR, hyperspectral imagery, and large inventory datasets , 2013, Int. J. Appl. Earth Obs. Geoinformation.

[27]  Daniel Peña Sánchez de Rivera Regresión y diseño de experimentos , 2002 .

[28]  Samia Boukir,et al.  Relevance of airborne lidar and multispectral image data for urban scene classification using Random Forests , 2011 .

[29]  Robert S. Leiken,et al.  A User’s Guide , 2011 .

[30]  Fabian Ewald Fassnacht,et al.  Forest structure modeling with combined airborne hyperspectral and LiDAR data , 2012 .

[31]  Anikó Ekárt,et al.  Genetic algorithms in computer aided design , 2003, Comput. Aided Des..

[32]  Bengt J Allen,et al.  Statistics: Concepts and Applications for Science.ByDavid LeBlanc.Sudbury (Massachusetts): Jones and Bartlett Publishers. $89.95 (two‐volume set). xvii + 382 p; ill.; index. ISBN: 0–7637–4699–1. 2004.Workbook to AccompanyStatistics: Concepts and Applications for Science.ByDavid LeBlanc.Sudbury (Mass , 2004 .

[33]  Robert P Freckleton,et al.  Why do we still use stepwise modelling in ecology and behaviour? , 2006, The Journal of animal ecology.

[34]  K. Kraus,et al.  Determination of terrain models in wooded areas with airborne laser scanner data , 1998 .

[35]  R. H. Myers Classical and modern regression with applications , 1986 .

[36]  Michele Dalponte,et al.  The role of ground reference data collection in the prediction of stem volume with LiDAR data in mountain areas , 2011 .

[37]  N. Pfeifer,et al.  Correction of laser scanning intensity data: Data and model-driven approaches , 2007 .

[38]  W. R. Buckland,et al.  Contributions to Probability and Statistics , 1960 .

[39]  Uwe Soergel,et al.  Relevance assessment of full-waveform lidar data for urban area classification , 2011 .

[40]  David A. Belsley,et al.  Conditioning Diagnostics: Collinearity and Weak Data in Regression , 1991 .

[41]  Guoqing Sun,et al.  Forest biomass mapping from lidar and radar synergies , 2011 .

[42]  Florian Siegert,et al.  Above ground biomass estimation across forest types at different degradation levels in Central Kalimantan using LiDAR data , 2012, Int. J. Appl. Earth Obs. Geoinformation.

[43]  Marco Zaffalon,et al.  Credible classification for environmental problems , 2005, Environ. Model. Softw..

[44]  Gary R. Weckman,et al.  Modeling microalgal abundance with artificial neural networks: Demonstration of a heuristic 'Grey-Box' to deconvolve and quantify environmental influences , 2012, Environ. Model. Softw..

[45]  Michele Dalponte,et al.  Airborne laser scanning of forest resources: An overview of research in Italy as a commentary case study , 2013, Int. J. Appl. Earth Obs. Geoinformation.

[46]  Bruce Ratner,et al.  Statistical and Machine-Learning Data Mining: Techniques for Better Predictive Modeling and Analysis of Big Data , 2011 .

[47]  Nobuya Mizoue,et al.  Effects of training for inexperienced surveyors on data quality of tree diameter and height measurements , 2010 .

[48]  Francisco Herrera,et al.  A study on the use of statistical tests for experimentation with neural networks: Analysis of parametric test conditions and non-parametric tests , 2007, Expert Syst. Appl..

[49]  John H. Holland,et al.  Adaptation in Natural and Artificial Systems: An Introductory Analysis with Applications to Biology, Control, and Artificial Intelligence , 1992 .

[50]  Douglas M. Hawkins,et al.  The Problem of Overfitting , 2004, J. Chem. Inf. Model..

[51]  T. Noland,et al.  Classification of tree species based on structural features derived from high density LiDAR data , 2013 .

[52]  Nicholas C. Coops,et al.  Simulation study for finding optimal lidar acquisition parameters for forest height retrieval , 2005 .

[53]  Francisco Herrera,et al.  Advanced nonparametric tests for multiple comparisons in the design of experiments in computational intelligence and data mining: Experimental analysis of power , 2010, Inf. Sci..