Comparing Generalized Linear Models and random forest to model vascular plant species richness using LiDAR data in a natural forest in central Chile

Abstract Biodiversity is considered to be an essential element of the Earth system, driving important ecosystem services. However, the conservation of biodiversity in a quickly changing world is a challenging task which requires cost-efficient and precise monitoring systems. In the present study, the suitability of airborne discrete-return LiDAR data for the mapping of vascular plant species richness within a Sub-Mediterranean second growth native forest ecosystem was examined. The vascular plant richness of four different layers (total, tree, shrub and herb richness) was modeled using twelve LiDAR-derived variables. As species richness values are typically count data, the corresponding asymmetry and heteroscedasticity in the error distribution has to be considered. In this context, we compared the suitability of random forest (RF) and a Generalized Linear Model (GLM) with a negative binomial error distribution. Both models were coupled with a feature selection approach to identify the most relevant LiDAR predictors and keep the models parsimonious. The results of RF and GLM agreed that the three most important predictors for all four layers were altitude above sea level, standard deviation of slope and mean canopy height. This was consistent with the preconception of LiDAR's suitability for estimating species richness, which is its capacity to capture three types of information: micro-topographical, macro-topographical and canopy structural. Generalized Linear Models showed higher performances (r2: 0.66, 0.50, 0.52, 0.50; nRMSE: 16.29%, 19.08%, 17.89%, 21.31% for total, tree, shrub and herb richness respectively) than RF (r2: 0.55, 0.33, 0.45, 0.46; nRMSE: 18.30%, 21.90%, 18.95%, 21.00% for total, tree, shrub and herb richness, respectively). Furthermore, the results of the best GLM were more parsimonious (three predictors) and less biased than the best RF models (twelve predictors). We think that this is due to the mentioned non-symmetric error distribution of the species richness values, which RF is unable to properly capture. From an ecological perspective, the predicted patterns agreed well with the known vegetation composition of the area. We found especially high species numbers at low elevations and along riversides. In these areas, overlapping distributions of thermopile sclerophyllos species, water demanding Valdivian evergreen species and species growing in Nothofagus obliqua forests occur. The three main conclusions of the study are: 1) appropriate model selection is crucial when working with biodiversity count data; 2) the application of RF for data with non-symmetric error distributions is questionable; and 3) structural and topographic information derived from LiDAR data is useful for predicting local plant species richness.

[1]  P. Gessler,et al.  Characterizing forest succession with lidar data: An evaluation for the Inland Northwest, USA , 2009 .

[2]  G. Foody,et al.  Mapping the species richness and composition of tropical forests from remotely sensed data with neural networks , 2006 .

[3]  Wei-Yin Loh,et al.  Classification and regression trees , 2011, WIREs Data Mining Knowl. Discov..

[4]  Aniruddha Ghosh,et al.  A comparison of selected classification algorithms for mapping bamboo patches in lower Gangetic plains using very high resolution WorldView 2 imagery , 2014, Int. J. Appl. Earth Obs. Geoinformation.

[5]  D. Boucher,et al.  Tree species-richness and topographic complexity along the riparian edge of the Potomac River , 1998 .

[6]  Erich Barke,et al.  Hierarchical partitioning , 1996, Proceedings of International Conference on Computer Aided Design.

[7]  Henning Buddenbaum,et al.  Comparison of Feature Reduction Algorithms for Classifying Tree Species With Hyperspectral Data on Three Central European Test Sites , 2014, IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing.

[8]  R Core Team,et al.  R: A language and environment for statistical computing. , 2014 .

[9]  Tommy Dalgaard,et al.  Topographically controlled soil moisture is the primary driver of local vegetation patterns across a lowland region , 2013 .

[10]  Thomas Wohlgemuth,et al.  Quantitative tools for perfecting species lists , 2002 .

[11]  J. Emmett Duffy Why biodiversity is important to the functioning of real‐world ecosystems , 2009 .

[12]  Andri Baltensweiler,et al.  High‐resolution remote sensing data improves models of species richness , 2013 .

[13]  Roland Brandl,et al.  LiDAR as a rapid tool to predict forest habitat types in Natura 2000 networks , 2011, Biodiversity and Conservation.

[14]  D. Rocchini,et al.  Does using species abundance data improve estimates of species diversity from remotely sensed spectral heterogeneity , 2010 .

[15]  G. Henebry,et al.  Remote sensing of vegetation 3-D structure for biodiversity and habitat: Review and implications for lidar and radar spaceborne missions , 2009 .

[16]  W. Cohen,et al.  Lidar Remote Sensing for Ecosystem Studies , 2002 .

[17]  G. Foody,et al.  Mapping the richness and composition of British breeding birds from coarse spatial resolution satellite sensor imagery , 2005 .

[18]  Hailemariam Temesgen,et al.  Estimating Riparian Understory Vegetation Cover with Beta Regression and Copula Models , 2011 .

[19]  W. Turner Sensing biodiversity , 2014, Science.

[20]  Lee A. Vierling,et al.  The use of airborne lidar to assess avian species diversity, density, and occurrence in a pine/aspen forest , 2008 .

[21]  M. Fladeland,et al.  Remote sensing for biodiversity science and conservation , 2003 .

[22]  J. French,et al.  Airborne LiDAR in support of geomorphological and hydraulic modelling , 2003 .

[23]  José Luis Hernández-Stefanoni,et al.  Improving Species Diversity and Biomass Estimates of Tropical Dry Forests Using Airborne LiDAR , 2014, Remote. Sens..

[24]  Manuel R. Guariguata,et al.  Neotropical secondary forest succession : changes in structural and functional characteristics , 2001 .

[25]  Garry D. Peterson,et al.  Scenarios for Ecosystem Services: An Overview , 2006 .

[26]  James J. Chen,et al.  Regression trees for analysis of count data with extra Poisson variation , 2005, Comput. Stat. Data Anal..

[27]  Rodolfo Gajardo La vegetación natural de Chile : clasificación y distribución geográfica , 1994 .

[28]  K. Lim,et al.  Predicting forest stand variables from LiDAR data in the Great Lakes St. Lawrence forest of Ontario , 2008 .

[29]  H. H. Bruun,et al.  Effects of altitude and topography on species richness of vascular plants, bryophytes and lichens in alpine communities , 2006 .

[30]  W. Manning,et al.  Estimating Log Models: To Transform or Not to Transform? , 1999, Journal of health economics.

[31]  C. Meynard,et al.  Predicting species distributions: a critical comparison of the most common statistical models using artificial species , 2007 .

[32]  J. Silvertown,et al.  Hydrologically defined niches reveal a basis for species richness in plant communities , 1999, Nature.

[33]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[34]  Robert B. O'Hara,et al.  Do not log‐transform count data , 2010 .

[35]  Fabian Ewald Fassnacht,et al.  Forest structure modeling with combined airborne hyperspectral and LiDAR data , 2012 .

[36]  Beatriz Pateiro-López,et al.  Generalizing the Convex Hull of a Sample: The R Package alphahull , 2010 .

[37]  Florian Hartig,et al.  Importance of sample size, data type and prediction method for remote sensing-based estimations of aboveground forest biomass , 2014 .

[38]  Jens Emborg,et al.  Understorey light conditions and regeneration with respect to the structural dynamics of a near-natural temperate deciduous forest in Denmark , 1998 .

[39]  L. Vierling,et al.  Spinning a laser web: predicting spider distributions using LiDAR. , 2011, Ecological applications : a publication of the Ecological Society of America.

[40]  P. Treitz,et al.  Predicting fine-scale tree species abundance patterns using biotic variables derived from LiDAR and high spatial resolution imagery , 2014 .

[41]  Stefano Bocchi,et al.  Fine-scale assessment of hay meadow productivity and plant diversity in the European Alps using field spectrometric data , 2010 .

[42]  Duccio Rocchini,et al.  Assessing Plant Diversity in a Dry Tropical Forest: Comparing the Utility of Landsat and Ikonos Satellite Images , 2010, Remote. Sens..

[43]  Robert P. Sheridan,et al.  Random Forest: A Classification and Regression Tool for Compound Classification and QSAR Modeling , 2003, J. Chem. Inf. Comput. Sci..

[44]  K. Beven,et al.  A physically based, variable contributing area model of basin hydrology , 1979 .

[45]  Benjamin Gilbert,et al.  Neutrality, niches, and dispersal in a temperate forest understory. , 2004, Proceedings of the National Academy of Sciences of the United States of America.

[46]  K. Gaston Global patterns in biodiversity , 2000, Nature.

[47]  Huadong Guo,et al.  Earth observation satellite sensors for biodiversity monitoring: potentials and bottlenecks , 2014 .

[48]  S. Rigatti Random Forest. , 2017, Journal of insurance medicine.

[49]  Michael J. Olsen,et al.  Prediction of understory vegetation cover with airborne lidar in an interior ponderosa pine forest , 2012 .

[50]  Roberta E. Martin,et al.  Linking imaging spectroscopy and LiDAR with floristic composition and forest structure in Panama , 2014 .

[51]  H. Kreft,et al.  Environmental heterogeneity as a universal driver of species richness across taxa, biomes and spatial scales. , 2014, Ecology letters.

[52]  M. Fredette,et al.  Regression trees and forests for non-homogeneous Poisson processes , 2015 .

[53]  P. McCullagh,et al.  Generalized Linear Models , 1984 .

[54]  Joseph H. A. Guillaume,et al.  Characterising performance of environmental models , 2013, Environ. Model. Softw..

[55]  Duccio Rocchini,et al.  Testing the spectral variation hypothesis by using satellite multispectral images , 2004 .

[56]  V. Wolters,et al.  Landscape structure as an indicator of biodiversity: matrix effects on species richness , 2003 .

[57]  Paul M. Treitz,et al.  Leaf Area Index (LAI) Estimation in Boreal Mixedwood Forest of Ontario, Canada Using Light Detection and Ranging (LiDAR) and WorldView-2 Imagery , 2013, Remote. Sens..

[58]  M. Galleguillos,et al.  Presencia, abundancia y asociatividad de Citronella mucronata en bosques secundarios de Nothofagus obliqua en la precordillera de Curicó, región del Maule, Chile , 2014 .

[59]  S. Popescu,et al.  Measuring individual tree crown diameter with lidar and assessing its influence on estimating forest volume and biomass , 2003 .

[60]  Mauricio Galleguillos,et al.  Comparison of Airborne LiDAR and Satellite Hyperspectral Remote Sensing to Estimate Vascular Plant Richness in Deciduous Mediterranean Forests of Central Chile , 2015, Remote. Sens..

[61]  K. Itten,et al.  Estimation of LAI and fractional cover from small footprint airborne laser scanning data based on gap fraction , 2006 .

[62]  C. Butler,et al.  Linking future ecosystem services and future human well-being , 2006 .

[63]  Markus Neteler,et al.  Remotely sensed spectral heterogeneity as a proxy of species diversity: Recent advances and open challenges , 2010, Ecol. Informatics.

[64]  P. Balvanera,et al.  Quantifying the evidence for biodiversity effects on ecosystem functioning and services. , 2006, Ecology letters.

[65]  Barbara Koch,et al.  Mapping forest biomass from space - Fusion of hyperspectral EO1-hyperion data and Tandem-X and WorldView-2 canopy height models , 2015, Int. J. Appl. Earth Obs. Geoinformation.

[66]  Edward W. Bork,et al.  Characterization of diverse plant communities in Aspen Parkland rangeland using LiDAR data , 2007 .

[67]  Jörg Müller,et al.  Modelling Forest α-Diversity and Floristic Composition - On the Added Value of LiDAR plus Hyperspectral Remote Sensing , 2012, Remote. Sens..

[68]  D. Rocchini Effects of spatial and spectral resolution in estimating ecosystem α-diversity by satellite imagery , 2007 .

[69]  D. Coomes,et al.  Use of an Airborne Lidar System to Model Plant Species Composition and Diversity of Mediterranean Oak Forests , 2012, Conservation biology : the journal of the Society for Conservation Biology.

[70]  D. Teketay,et al.  Effects of canopy cover and understory environment of tree plantations on richness, density and size of colonizing woody species in southern Ethiopia , 2004 .

[71]  B. Ripley,et al.  Recursive Partitioning and Regression Trees , 2015 .

[72]  S. Goetz,et al.  Laser remote sensing of canopy habitat heterogeneity as a predictor of bird species richness in an eastern temperate forest, USA , 2006 .

[73]  A. Hayes,et al.  Using heteroskedasticity-consistent standard error estimators in OLS regression: An introduction and software implementation , 2007, Behavior research methods.

[74]  Andy Liaw,et al.  Classification and Regression by randomForest , 2007 .

[75]  Andrew T. Hudak,et al.  Terrain and vegetation structural influences on local avian species richness in two mixed-conifer forests , 2014 .

[76]  Hannes Feilhauer,et al.  Mapping continuous fields of forest alpha and beta diversity , 2009 .

[77]  J. C. Kasper,et al.  Electron and proton heating by solar wind turbulence , 2009, 0907.4074.

[78]  Fabian Ewald Fassnacht,et al.  Using a Multistructural Object-Based LiDAR Approach to Estimate Vascular Plant Richness in Mediterranean Forests With Complex Structure , 2015, IEEE Geoscience and Remote Sensing Letters.

[79]  A. Zeileis,et al.  Regression Models for Count Data in R , 2008 .

[80]  W. Loh,et al.  Generalized regression trees , 1995 .

[81]  Giles M. Foody,et al.  Tree biodiversity in protected and logged Bornean tropical rain forests and its measurement by satellite remote sensing , 2003 .

[82]  A. Buttler,et al.  Plant species richness and environmental heterogeneity in a mountain landscape: effects of variability and spatial configuration , 2006 .

[83]  Sassan Saatchi,et al.  Plant Species Richness is Associated with Canopy Height and Topography in a Neotropical Forest , 2012, Remote. Sens..