Machine Learning Using Hyperspectral Data Inaccurately Predicts Plant Traits Under Spatial Dependency

Spectral, temporal and spatial dimensions are difficult to model together when predicting in situ plant traits from remote sensing data. Therefore, machine learning algorithms solely based on spectral dimensions are often used as predictors, even when there is a strong effect of spatial or temporal autocorrelation in the data. A significant reduction in prediction accuracy is expected when algorithms are trained using a sequence in space or time that is unlikely to be observed again. The ensuing inability to generalise creates a necessity for ground-truth data for every new area or period, provoking the propagation of “single-use” models. This study assesses the impact of spatial autocorrelation on the generalisation of plant trait models predicted with hyperspectral data. Leaf Area Index (LAI) data generated at increasing levels of spatial dependency are used to simulate hyperspectral data using Radiative Transfer Models. Machine learning regressions to predict LAI at different levels of spatial dependency are then tuned (determining the optimum model complexity) using cross-validation as well as the NOIS method. The results show that cross-validated prediction accuracy tends to be overestimated when spatial structures present in the training data are fitted (or learned) by the model.

[1]  A. Skidmore,et al.  Mapping grassland leaf area index with airborne hyperspectral imagery : a comparison study of statistical approaches and inversion of radiative transfer models , 2011 .

[2]  Philip Lewis,et al.  Hyperspectral remote sensing of foliar nitrogen content , 2012, Proceedings of the National Academy of Sciences.

[3]  Gregory Asner,et al.  Combining Hyperspectral Remote Sensing and Physical Modeling for Applications in Land Ecosystems , 2006, 2006 IEEE International Symposium on Geoscience and Remote Sensing.

[4]  R. G. Davies,et al.  Methods to account for spatial autocorrelation in the analysis of species distributional data : a review , 2007 .

[5]  Fred Ortenberg Hyperspectral Sensor Characteristics , 2018, Fundamentals, Sensor Systems, Spectral Libraries, and Data Mining for Vegetation.

[6]  J. Chen,et al.  Defining leaf area index for non‐flat leaves , 1992 .

[7]  Andrew K. Skidmore,et al.  Changes in plant defense chemistry (pyrrolizidine alkaloids) revealed through high-resolution spectroscopy , 2013 .

[8]  Max Kuhn,et al.  Applied Predictive Modeling , 2013 .

[9]  Roberta E. Martin,et al.  Multi-method ensemble selection of spectral bands related to leaf biochemistry , 2015 .

[10]  P. Legendre Spatial Autocorrelation: Trouble or New Paradigm? , 1993 .

[11]  Fuan Tsai,et al.  Derivative Analysis of Hyperspectral Data , 1998 .

[12]  Carsten F. Dormann,et al.  Cross-validation strategies for data with temporal, spatial, hierarchical, or phylogenetic structure , 2017 .

[13]  Trevor Hastie,et al.  An Introduction to Statistical Learning , 2013, Springer Texts in Statistics.

[14]  G. Waldhoff,et al.  Spatial Heterogeneity of Leaf Area Index (LAI) and Its Temporal Course on Arable Land: Combining Field Measurements, Remote Sensing and Simulation in a Comprehensive Data Analysis Approach (CDAA) , 2016, PloS one.

[15]  Huanhuan Yuan,et al.  Retrieving Soybean Leaf Area Index from Unmanned Aerial Vehicle Hyperspectral Remote Sensing: Analysis of RF, ANN, and SVM Regression Models , 2017, Remote. Sens..

[16]  Nigel P. Fox,et al.  Progress in Field Spectroscopy , 2006, 2006 IEEE International Symposium on Geoscience and Remote Sensing.

[17]  Guangjian Yan,et al.  Evaluation of Sampling Methods for Validation of Remotely Sensed Fractional Vegetation Cover , 2015, Remote. Sens..

[18]  M. Vohland,et al.  Estimating structural and biochemical parameters for grassland from spectroradiometer data by radiative transfer modelling (PROSPECT+SAIL) , 2008 .

[19]  A. Formaggio,et al.  Influence of data acquisition geometry on soybean spectral response simulated by the prosail model , 2013 .

[20]  W. Verstraeten,et al.  A near-infrared narrow-waveband ratio to determine Leaf Area Index in orchards , 2008 .

[21]  Yuri Knyazikhin,et al.  Retrieval of canopy biophysical variables from bidirectional reflectance Using prior information to solve the ill-posed inverse problem , 2003 .

[22]  M. Fortin,et al.  Spatial statistics, spatial regression, and graph theory in ecology , 2012 .

[23]  C. A. Mücher,et al.  Environmental science: Agree on biodiversity metrics to track from space , 2015, Nature.

[24]  Douglas M. Hawkins,et al.  The Problem of Overfitting , 2004, J. Chem. Inf. Model..

[25]  Ben Somers,et al.  Optical trait indicators for remote sensing of plant species composition: Predictive power and seasonal variability , 2017 .

[26]  M. Hooten,et al.  A general science-based framework for dynamical spatio-temporal models , 2010 .

[27]  T. Groen,et al.  Spatial autocorrelation in predictors reduces the impact of positional uncertainty in occurrence data on species distribution modelling , 2011 .

[28]  P. Curran Remote sensing of foliar chemistry , 1989 .

[29]  M. Cochrane Using vegetation reflectance variability for species level classification of hyperspectral data , 2000 .

[30]  Wolfram Mauser,et al.  Evaluation of the PROSAIL Model Capabilities for Future Hyperspectral Model Environments: A Review Study , 2018, Remote. Sens..

[31]  Monica G. Turner,et al.  Ecosystem Function in Heterogeneous Landscapes , 2005 .

[32]  Roberta E. Martin,et al.  PROSPECT-4 and 5: Advances in the leaf optical properties model separating photosynthetic pigments , 2008 .

[33]  André Elisseeff,et al.  Stability and Generalization , 2002, J. Mach. Learn. Res..

[34]  Anna Jarocińska,et al.  Radiative Transfer Model parametrization for simulating the reflectance of meadow vegetation , 2014 .

[35]  R. Houborg,et al.  Remote sensing of LAI, chlorophyll and leaf nitrogen pools of crop and grasslands in five European landscapes , 2012 .

[36]  Agustín Lobo,et al.  Analysis of fine-scale spatial pattern of a grassland from remotely-sensed imagery and field collected data , 1998, Landscape Ecology.

[37]  Yoshio Inoue,et al.  Hyperspectral Remote Sensing in Global Change Studies , 2018, Fundamentals, Sensor Systems, Spectral Libraries, and Data Mining for Vegetation.

[38]  W. Tobler A Computer Movie Simulating Urban Growth in the Detroit Region , 1970 .

[39]  D. Ruppert The Elements of Statistical Learning: Data Mining, Inference, and Prediction , 2004 .

[40]  M. Fortin,et al.  Spatial pattern and ecological analysis , 1989, Vegetatio.

[41]  Michael E. Schaepman,et al.  Estimating canopy water content using hyperspectral remote sensing data , 2010, Int. J. Appl. Earth Obs. Geoinformation.

[42]  Simon D. Jones,et al.  Understanding the variability in ground-based methods for retrieving canopy openness, gap fraction, and leaf area index in diverse forest systems , 2015 .

[43]  John M. Norman,et al.  On the correct estimation of gap fraction: How to remove scattered radiation in gap fraction measurements? , 2013 .

[44]  Alexander Brenning,et al.  Spatial cross-validation and bootstrap for the assessment of prediction rules in remote sensing: The R package sperrorest , 2012, 2012 IEEE International Geoscience and Remote Sensing Symposium.

[45]  Andrew O. Finley,et al.  Multivariate Spatial Regression Models for Predicting Individual Tree Structure Variables Using LiDAR Data , 2013, IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing.

[46]  Fred Ortenberg,et al.  Hyperspectral Sensor Characteristics: Airborne, Spaceborne, Hand-Held, and Truck-Mounted; Integration of Hyperspectral Data with LIDAR , 2011 .

[47]  Grant D. Pearse,et al.  Comparison of optical LAI measurements under diffuse and clear skies after correcting for scattered radiation , 2016 .

[48]  W. Verhoef,et al.  PROSPECT+SAIL models: A review of use for vegetation characterization , 2009 .

[49]  Bradford A. Hawkins,et al.  Eight (and a half) deadly sins of spatial analysis , 2012 .

[50]  José A. Sobrino,et al.  Fourth International Symposium on Recent Advances in Quantitative Remote Sensing , 2015 .

[51]  H. Rue,et al.  Spatial Data Analysis with R-INLA with Some Extensions , 2015 .

[52]  Andrew K. Skidmore,et al.  Spectroscopic determination of leaf traits using infrared spectra , 2018, Int. J. Appl. Earth Obs. Geoinformation.

[53]  P. Thenkabail,et al.  Hyperspectral Vegetation Indices and Their Relationships with Agricultural Crop Characteristics , 2000 .

[54]  Damaris Zurell,et al.  Collinearity: a review of methods to deal with it and a simulation study evaluating their performance , 2013 .

[55]  A. Skidmore,et al.  Mapping spatio-temporal variation of grassland quantity and quality using MERIS data and the PROSAIL model , 2012 .

[56]  Jinfeng Wang,et al.  A review of spatial sampling , 2012 .

[57]  Gary A. Shaw,et al.  Hyperspectral Image Processing for Automatic Target Detection Applications , 2003 .

[58]  B. Turner,et al.  Performance of a neural network: mapping forests using GIS and remotely sensed data , 1997 .

[59]  Louise Willemen,et al.  The Naïve Overfitting Index Selection (NOIS): A new method to optimize model complexity for hyperspectral data , 2017 .