Comparison between Random Forests, Artificial Neural Networks and Gradient Boosted Machines Methods of On-Line Vis-NIR Spectroscopy Measurements of Soil Total Nitrogen and Total Carbon

Accurate and detailed spatial soil information about within-field variability is essential for variable-rate applications of farm resources. Soil total nitrogen (TN) and total carbon (TC) are important fertility parameters that can be measured with on-line (mobile) visible and near infrared (vis-NIR) spectroscopy. This study compares the performance of local farm scale calibrations with those based on the spiking of selected local samples from both fields into an European dataset for TN and TC estimation using three modelling techniques, namely gradient boosted machines (GBM), artificial neural networks (ANNs) and random forests (RF). The on-line measurements were carried out using a mobile, fiber type, vis-NIR spectrophotometer (305–2200 nm) (AgroSpec from tec5, Germany), during which soil spectra were recorded in diffuse reflectance mode from two fields in the UK. After spectra pre-processing, the entire datasets were then divided into calibration (75%) and prediction (25%) sets, and calibration models for TN and TC were developed using GBM, ANN and RF with leave-one-out cross-validation. Results of cross-validation showed that the effect of spiking of local samples collected from a field into an European dataset when combined with RF has resulted in the highest coefficients of determination (R2) values of 0.97 and 0.98, the lowest root mean square error (RMSE) of 0.01% and 0.10%, and the highest residual prediction deviations (RPD) of 5.58 and 7.54, for TN and TC, respectively. Results for laboratory and on-line predictions generally followed the same trend as for cross-validation in one field, where the spiked European dataset-based RF calibration models outperformed the corresponding GBM and ANN models. In the second field ANN has replaced RF in being the best performing. However, the local field calibrations provided lower R2 and RPD in most cases. Therefore, from a cost-effective point of view, it is recommended to adopt the spiked European dataset-based RF/ANN calibration models for successful prediction of TN and TC under on-line measurement conditions.

[1]  S. T. Gower,et al.  Measurements and Modeling of Carbon and Nitrogen Cycling in Agroecosystems of Southern Wisconsin: Potential for SOC Sequestration during the Next 50 Years , 2001, Ecosystems.

[2]  Mark R. Segal,et al.  Multivariate random forests , 2011, WIREs Data Mining Knowl. Discov..

[3]  A. Roli Artificial Neural Networks , 2012, Lecture Notes in Computer Science.

[4]  K. Shepherd,et al.  Development of Reflectance Spectral Libraries for Characterization of Soil Properties , 2002 .

[5]  R. V. Rossel,et al.  Visible and near infrared spectroscopy in soil science , 2010 .

[6]  Dominique Arrouays,et al.  Spatial distribution of soil organic carbon stocks in France , 2010 .

[7]  César Guerrero,et al.  Spiking of NIR regional models using samples from target sites: effect of model size on prediction accuracy. , 2010 .

[8]  H. Ramon,et al.  Comparison among principal component, partial least squares and back propagation neural network analyses for accuracy of measurement of selected soil properties with visible and near infrared spectroscopy , 2010 .

[9]  David J. Brown Using a global VNIR soil-spectral library for local soil characterization and landscape modeling in a 2nd-order Uganda watershed , 2007 .

[10]  Sabine Grunwald,et al.  Transferability and Scaling of VNIR Prediction Models for Soil Total Carbon in Florida , 2016 .

[11]  M. Forina,et al.  Multivariate calibration. , 2007, Journal of chromatography. A.

[12]  J Elith,et al.  A working guide to boosted regression trees. , 2008, The Journal of animal ecology.

[13]  Jesús Hernán Camacho-Tamayo,et al.  Mid-infrared spectroscopy for the estimation of some soil properties , 2015 .

[14]  R. V. Rossel,et al.  Using data mining to model and interpret soil diffuse reflectance spectra. , 2010 .

[15]  Rebecca L. Whetton,et al.  Machine learning based prediction of soil total nitrogen, organic carbon and moisture content by using VIS-NIR spectroscopy , 2016 .

[16]  Michael Thiel,et al.  High Resolution Mapping of Soil Properties Using Remote Sensing Variables in South-Western Burkina Faso: A Comparison of Machine Learning and Multiple Linear Regression Models , 2017, PloS one.

[17]  Rick L. Lawrence,et al.  Comparing local vs. global visible and near-infrared (VisNIR) diffuse reflectance spectroscopy (DRS) calibrations for the prediction of soil clay, organic C and inorganic C , 2008 .

[18]  Eric R. Ziegel,et al.  The Elements of Statistical Learning , 2003, Technometrics.

[19]  K. Shepherd,et al.  Global soil characterization with VNIR diffuse reflectance spectroscopy , 2006 .

[20]  B. Stenberg,et al.  Near‐infrared spectroscopy for within‐field soil characterization: small local calibrations compared with national libraries spiked with local samples , 2010 .

[21]  Abdul Mounem Mouazen,et al.  Comparison between artificial neural network and partial least squares for on-line visible and near infrared spectroscopy measurement of soil organic carbon, pH and clay content , 2015 .

[22]  R Core Team,et al.  R: A language and environment for statistical computing. , 2014 .

[23]  Sabine Grunwald,et al.  Soil total carbon analysis in Hawaiian soils with visible, near-infrared and mid-infrared diffuse reflectance spectroscopy , 2012 .

[24]  J. Friedman Stochastic gradient boosting , 2002 .

[25]  P. Gemperline,et al.  Spectroscopic calibration and quantitation using artificial neural networks , 1990 .

[26]  R. G. Davies,et al.  Methods to account for spatial autocorrelation in the analysis of species distributional data : a review , 2007 .

[27]  Rich Caruana,et al.  An empirical comparison of supervised learning algorithms , 2006, ICML.

[28]  A. Kravchenko,et al.  Soil carbon mapping using on-the-go near infrared spectroscopy, topography and aerial photographs , 2011 .

[29]  H. Ishwaran Variable importance in binary regression trees and forests , 2007, 0711.2434.

[30]  Robert E. Schapire,et al.  The Boosting Approach to Machine Learning An Overview , 2003 .

[31]  William J. Welch,et al.  Computer-aided design of experiments , 1981 .

[32]  T. G. Orton,et al.  Evaluation of modelling approaches for predicting the spatial distribution of soil organic carbon stocks at the national scale , 2014, 1502.02513.

[33]  A. Caudy,et al.  Targeted metabolomics in cultured cells and tissues by mass spectrometry: method development and validation. , 2014, Analytica chimica acta.

[34]  Abdul Mounem Mouazen,et al.  Predictive performance of mobile vis-near infrared spectroscopy for key soil properties at different geographical scales by using spiking and data mining techniques , 2017 .

[35]  Michelle C. Tappert,et al.  Monitoring organic carbon, total nitrogen, and pH for reclaimed soils using field reflectance spectroscopy , 2017, Canadian Journal of Soil Science.

[36]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[37]  D. Signorini,et al.  Neural networks , 1995, The Lancet.

[38]  Dandan Wang,et al.  Synthesized use of VisNIR DRS and PXRF for soil characterization: Total carbon and total nitrogen☆ , 2015 .

[39]  R. V. Rossel,et al.  Visible, near infrared, mid infrared or combined diffuse reflectance spectroscopy for simultaneous assessment of various soil properties , 2006 .

[40]  A. Mouazen,et al.  Calibration of visible and near infrared spectroscopy for soil analysis at the field scale on three European farms , 2011 .

[41]  Abdul Mounem Mouazen,et al.  Effect of spiking strategy and ratio on calibration of on-line visible and near infrared soil sensor for measurement in European farms , 2013 .

[42]  Karl H. Norris,et al.  Understanding and Correcting the Factors Which Affect Diffuse Transmittance Spectra , 2001 .

[43]  A. Prasad,et al.  Newer Classification and Regression Tree Techniques: Bagging and Random Forests for Ecological Prediction , 2006, Ecosystems.

[44]  Henning Buddenbaum,et al.  Estimating the soil clay content and organic matter by means of different calibration methods of vis-NIR diffuse reflectance spectroscopy , 2016 .

[45]  J. Peters,et al.  Random forests as a tool for ecohydrological distribution modelling , 2007 .

[46]  Peter Filzmoser,et al.  Introduction to Multivariate Statistical Analysis in Chemometrics , 2009 .