A Random Forest Machine Learning Approach for the Retrieval of Leaf Chlorophyll Content in Wheat

Developing rapid and non-destructive methods for chlorophyll estimation over large spatial areas is a topic of much interest, as it would provide an indirect measure of plant photosynthetic response, be useful in monitoring soil nitrogen content, and offer the capacity to assess vegetation structural and functional dynamics. Traditional methods of direct tissue analysis or the use of handheld meters, are not able to capture chlorophyll variability at anything beyond point scales, so are not particularly useful for informing decisions on plant health and status at the field scale. Examining the spectral response of plants via remote sensing has shown much promise as a means to capture variations in vegetation properties, while offering a non-destructive and scalable approach to monitoring. However, determining the optimum combination of spectra or spectral indices to inform plant response remains an active area of investigation. Here, we explore the use of a machine learning approach to enhance the estimation of leaf chlorophyll (Chlt), defined as the sum of chlorophyll a and b, from spectral reflectance data. Using an ASD FieldSpec 4 Hi-Res spectroradiometer, 2700 individual leaf hyperspectral reflectance measurements were acquired from wheat plants grown across a gradient of soil salinity and nutrient levels in a greenhouse experiment. The extractable Chlt was determined from laboratory analysis of 270 collocated samples, each composed of three leaf discs. A random forest regression algorithm was trained against these data, with input predictors based upon (1) reflectance values from 2102 bands across the 400–2500 nm spectral range; and (2) 45 established vegetation indices. As a benchmark, a standard univariate regression analysis was performed to model the relationship between measured Chlt and the selected vegetation indices. Results show that the root mean square error (RMSE) was significantly reduced when using the machine learning approach compared to standard linear regression. When exploiting the entire spectral range of individual bands as input variables, the random forest estimated Chlt with an RMSE of 5.49 μg·cm−2 and an R2 of 0.89. Model accuracy was improved when using vegetation indices as input variables, producing an RMSE ranging from 3.62 to 3.91 μg·cm−2, depending on the particular combination of indices selected. In further analysis, input predictors were ranked according to their importance level, and a step-wise reduction in the number of input features (from 45 down to 7) was performed. Implementing this resulted in no significant effect on the RMSE, and showed that much the same prediction accuracy could be obtained by a smaller subset of indices. Importantly, the random forest regression approach identified many important variables that were not good predictors according to their linear regression statistics. Overall, the research illustrates the promise in using established vegetation indices as input variables in a machine learning approach for the enhanced estimation of Chlt from hyperspectral data. Remote Sens. 2019, 11, 920; doi:10.3390/rs11080920 www.mdpi.com/journal/remotesensing Remote Sens. 2019, 11, 920 2 of 26

[1]  A. Wellburn The Spectral Determination of Chlorophylls a and b, as well as Total Carotenoids, Using Various Solvents with Spectrophotometers of Different Resolution* , 1994 .

[2]  P. Thenkabail,et al.  Hyperspectral Vegetation Indices and Their Relationships with Agricultural Crop Characteristics , 2000 .

[3]  B. Gao NDWI—A normalized difference water index for remote sensing of vegetation liquid water from space , 1996 .

[4]  Andrew K. Skidmore,et al.  Advances in remote sensing of vegetation function and traits , 2015, Int. J. Appl. Earth Obs. Geoinformation.

[5]  Jan Piekarczyk,et al.  Application Of Remote Sensing Methods In Agriculture , 2015 .

[6]  Wolfram Mauser,et al.  Evaluation of the PROSAIL Model Capabilities for Future Hyperspectral Model Environments: A Review Study , 2018, Remote. Sens..

[7]  S. Ollinger Sources of variability in canopy reflectance and the convergent properties of plants. , 2011, The New phytologist.

[8]  Johannes Strobel,et al.  An Exploration of Design Phenomena in Second Life , 2009 .

[9]  Cardona Alzate,et al.  Predicción y selección de variables con bosques aleatorios en presencia de variables correlacionadas , 2020 .

[10]  D. Sims,et al.  Relationships between leaf pigment content and spectral reflectance across a wide range of species, leaf structures and developmental stages , 2002 .

[11]  J. Schjoerring,et al.  Reflectance measurement of canopy biomass and nitrogen status in wheat crops using normalized difference vegetation indices and partial least squares regression , 2003 .

[12]  N. Broge,et al.  Airborne multispectral data for quantifying leaf area index, nitrogen concentration, and photosynthetic efficiency in agriculture , 2002 .

[13]  Rei Sonobe,et al.  Using spectral reflectance to estimate leaf chlorophyll content of tea with shading treatments , 2018, Biosystems Engineering.

[14]  Clement Atzberger,et al.  Estimation of Leaf Area Index Using DEIMOS-1 Data: Application and Transferability of a Semi-Empirical Relationship between two Agricultural Areas , 2013, Remote. Sens..

[15]  J. A. Schell,et al.  Monitoring the Vernal Advancement and Retrogradation (Green Wave Effect) of Natural Vegetation. [Great Plains Corridor] , 1973 .

[16]  A. Huete,et al.  Overview of the radiometric and biophysical performance of the MODIS vegetation indices , 2002 .

[17]  D. M. Moss,et al.  Red edge spectral measurements from sugar maple leaves , 1993 .

[18]  Xinkai Zhu,et al.  Estimation of biomass in wheat using random forest regression algorithm and remote sensing data , 2016 .

[19]  Jean-Michel Poggi,et al.  Variable selection using random forests , 2010, Pattern Recognit. Lett..

[20]  A. Viña,et al.  Comparison of different vegetation indices for the remote assessment of green leaf area index of crops , 2011 .

[21]  C. Schmullius,et al.  Remote sensing of ecosystem light use efficiency with MODIS-based PRI , 2011 .

[22]  Beatriz Fernández-Marín,et al.  Opening Pandora's box: cause and impact of errors on plant pigment studies , 2015, Front. Plant Sci..

[23]  Giorgos Mallinis,et al.  On the Use of Unmanned Aerial Systems for Environmental Monitoring , 2018, Remote. Sens..

[24]  Yiannis Ampatzidis,et al.  UAV-Based High Throughput Phenotyping in Citrus Utilizing Multispectral Imaging and Artificial Intelligence , 2019, Remote. Sens..

[25]  J. Peñuelas,et al.  The red edge position and shape as indicators of plant chlorophyll content, biomass and hydric status. , 1994 .

[26]  Reginald S. Fletcher,et al.  Using Vegetation Indices as Input into Random Forest for Soybean and Weed Classification , 2016 .

[27]  Christopher Conrad,et al.  Important Variables of a RapidEye Time Series for Modelling Biophysical Parameters of Winter Wheat , 2016 .

[28]  Didier Tanré,et al.  Atmospherically resistant vegetation index (ARVI) for EOS-MODIS , 1992, IEEE Trans. Geosci. Remote. Sens..

[29]  J. Dash,et al.  Evaluation of the MERIS terrestrial chlorophyll index , 2004 .

[30]  Alessandro Matese,et al.  Practical Applications of a Multisensor UAV Platform Based on Multispectral, Thermal and RGB High Resolution Images in Precision Viticulture , 2018, Agriculture.

[31]  John R. Miller,et al.  Hyperspectral vegetation indices and novel algorithms for predicting green LAI of crop canopies: Modeling and validation in the context of precision agriculture , 2004 .

[32]  Shunlin Liang,et al.  Recent developments in estimating land surface biogeophysical variables from optical remote sensing , 2007 .

[33]  Onisimo Mutanga,et al.  High density biomass estimation for wetland vegetation using WorldView-2 imagery and random forest regression algorithm , 2012, Int. J. Appl. Earth Obs. Geoinformation.

[34]  Matthew F. McCabe,et al.  Adapting a regularized canopy reflectance model (REGFLEC) for the retrieval challenges of dryland agricultural systems , 2016 .

[35]  R. Houborg,et al.  Response of Chlorophyll, Carotenoid and SPAD-502 Measurement to Salinity and Nutrient Stress in Wheat (Triticum aestivum L.) , 2017 .

[36]  C. Felby,et al.  Light-driven oxidation of polysaccharides by photosynthetic pigments and a metalloenzyme , 2016, Nature Communications.

[37]  A. Gitelson,et al.  Remote sensing of chlorophyll concentration in higher plant leaves , 1998 .

[38]  Paul J. Curran,et al.  Evaluation of the MERIS terrestrial chlorophyll index , 2004, IGARSS 2004. 2004 IEEE International Geoscience and Remote Sensing Symposium.

[39]  N. Goel,et al.  Influences of canopy architecture on relationships between various vegetation indices and LAI and Fpar: A computer simulation , 1994 .

[40]  S. Dobrowski,et al.  Steady-state chlorophyll a fluorescence detection from canopy derivative reflectance and double-peak red-edge effects , 2003 .

[41]  Ghulam Abbas,et al.  Salinity and drought interaction in wheat (Triticum aestivum L.) is affected by the genotype and plant growth stage , 2013, Acta Physiologiae Plantarum.

[42]  Hui Lin,et al.  Diagnosis the dust stress of wheat leaves with hyperspectral indices and random forest algorithm , 2016, 2016 IEEE International Geoscience and Remote Sensing Symposium (IGARSS).

[43]  J. Roujean,et al.  Estimating PAR absorbed by vegetation from bidirectional reflectance measurements , 1995 .

[44]  Roberta E. Martin,et al.  PROSPECT-4 and 5: Advances in the leaf optical properties model separating photosynthetic pigments , 2008 .

[45]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[46]  Ran Xu,et al.  Random forests for metric learning with implicit pairwise position dependence , 2012, KDD.

[47]  Michael K. Danquah,et al.  Chlorophyll Extraction from Microalgae: A Review on the Process Engineering Aspects , 2010 .

[48]  Moon S. Kim,et al.  Estimating Corn Leaf Chlorophyll Concentration from Leaf and Canopy Reflectance , 2000 .

[49]  Guofeng Wu,et al.  Wavelet-based coupling of leaf and canopy reflectance spectra to improve the estimation accuracy of foliar nitrogen concentration , 2018 .

[50]  G. Birth,et al.  Measuring the Color of Growing Turf with a Reflectance Spectrophotometer1 , 1968 .

[51]  D. Stengel,et al.  Algal chemodiversity and bioactivity: sources of natural variability and implications for commercial application. , 2011, Biotechnology advances.

[52]  Isam Bashour,et al.  Morphology and composition of some soils under cultivation in Saudi Arabia , 1983 .

[53]  Matthew F. McCabe,et al.  A hybrid training approach for leaf area index estimation via Cubist and random forests machine-learning , 2018 .

[54]  Philip A. Townsend,et al.  Leaf optical properties reflect variation in photosynthetic metabolism and its sensitivity to temperature , 2011, Journal of experimental botany.

[55]  Mariana Belgiu,et al.  Random forest in remote sensing: A review of applications and future directions , 2016 .

[56]  Onisimo Mutanga,et al.  Detecting Sirex noctilio grey-attacked and lightning-struck pine trees using airborne hyperspectral data, random forest and support vector machines classifiers , 2014 .

[57]  John H. Prueger,et al.  Value of Using Different Vegetative Indices to Quantify Agricultural Crop Characteristics at Different Growth Stages under Varying Management Practices , 2010, Remote. Sens..

[58]  F. Baret,et al.  PROSPECT: A model of leaf optical properties spectra , 1990 .

[59]  John A. Gamon,et al.  Assessing leaf pigment content and activity with a reflectometer , 1999 .

[60]  A. Gitelson,et al.  Assessing Carotenoid Content in Plant Leaves with Reflectance Spectroscopy¶ , 2002, Photochemistry and photobiology.

[61]  Josep Peñuelas,et al.  Visible and near-infrared reflectance techniques for diagnosing plant physiological status , 1998 .

[62]  J. G. White,et al.  Aerial Color Infrared Photography for Determining Early In‐Season Nitrogen Requirements in Corn , 2005 .

[63]  Ramón Díaz-Uriarte,et al.  Gene selection and classification of microarray data using random forest , 2006, BMC Bioinformatics.

[64]  Yong Liu,et al.  Comparative analysis of vegetation indices, non-parametric and physical retrieval methods for monitoring nitrogen in wheat using UAV-based multispectral imagery , 2016, 2016 IEEE International Geoscience and Remote Sensing Symposium (IGARSS).

[65]  A. Gitelson,et al.  Optical Properties and Nondestructive Estimation of Anthocyanin Content in Plant Leaves¶ , 2001, Photochemistry and photobiology.

[66]  J. Gamon,et al.  The photochemical reflectance index: an optical indicator of photosynthetic radiation use efficiency across species, functional types, and nutrient levels , 1997, Oecologia.

[67]  A. Viña,et al.  Remote estimation of canopy chlorophyll content in crops , 2005 .

[68]  Svetlana M. Kochubey,et al.  Derivative vegetation indices as a new approach in remote sensing of vegetation , 2012, Frontiers of Earth Science.

[69]  D. Arnon COPPER ENZYMES IN ISOLATED CHLOROPLASTS. POLYPHENOLOXIDASE IN BETA VULGARIS. , 1949, Plant physiology.

[70]  A. Prasad,et al.  Newer Classification and Regression Tree Techniques: Bagging and Random Forests for Ecological Prediction , 2006, Ecosystems.

[71]  Wang Jihua,et al.  Detection of Internal Leaf Structure Deterioration Using a New Spectral Ratio Index in the Near-Infrared Shoulder Region , 2014 .

[72]  S. H. Shah,et al.  Bioaugmented phytoremediation: a strategy for reclamation of diesel oil-contaminated soils. , 2014 .

[73]  R. Houborg,et al.  Remote sensing of LAI, chlorophyll and leaf nitrogen pools of crop and grasslands in five European landscapes , 2012 .

[74]  Pablo J. Zarco-Tejada,et al.  Using High-Resolution Hyperspectral and Thermal Airborne Imagery to Assess Physiological Condition in the Context of Wheat Phenotyping , 2015, Remote. Sens..

[75]  A. Gitelson,et al.  Use of a green channel in remote sensing of global vegetation from EOS- MODIS , 1996 .

[76]  Jiancheng Shi,et al.  The Future of Earth Observation in Hydrology. , 2017, Hydrology and earth system sciences.

[77]  Andy Liaw,et al.  Classification and Regression by randomForest , 2007 .

[78]  A. Gitelson,et al.  Relationships between gross primary production, green LAI, and canopy chlorophyll content in maize: Implications for remote sensing of primary production , 2014 .

[79]  José Antonio Lozano,et al.  Sensitivity Analysis of k-Fold Cross Validation in Prediction Error Estimation , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[80]  Quan Wang,et al.  Towards a Universal Hyperspectral Index to Assess Chlorophyll Content in Deciduous Forests , 2017, Remote. Sens..