Development and Update Process of VNIR-Based Models Built to Predict Soil Organic Carbon

The large number of samples, time, and cost to assess soil organic C (SOC) with standard procedures has led to the interest in proximal sensing with visible and near-infrared (VNIR) diffuse reflectance spectroscopy. The objectives of the present study were to (i) evaluate the effect of multivariate techniques and spectra preprocessing methods on the performance of VNIR-based models, (ii) evaluate the effect of subsetting datasets to improve the prediction accuracy of models, and (iii) present a systematic iterative model development and update process. There were three datasets: Dataset-1 was used to the initial model development; Dataset-2 was used to revalidate models developed with Dataset-1; Dataset-3 was used to update promising models identified with Dataset-1 and -2. During initial model development with Dataset-1, the dataset was subset in clusters to try to improve model performance. Subsetting datasets did not improve model performance. Revalidating models with Dataset-2 helped to identify the lack of robustness in the initial models. This is related to the increased sample diversity in Dataset-2 compared to Dataset-1 and highlights the importance of continuously updating models to cover more variability. Based on Dataset-1 and 2, promising models were updated with the larger and more diverse Dataset-3. Following this update, the best model had a coefficient of multiple determination (R²), root mean squared prediction error (RMSPE), and residual prediction deviation (RPD) of 0.95, 2.062, and 4.39%, respectively. Collecting and evaluating data in separate sets allowed models to be revalidated and updated with new independent samples. This continuous process provides robust models to end users.

[1]  H. Heise,et al.  Chemometrics in Near‐Infrared Spectroscopy , 2007 .

[2]  S. Wold,et al.  The multivariate calibration problem in chemistry solved by the PLS method , 1983 .

[3]  R. Weil,et al.  Significance of Soil Organic Matter to Soil Quality and Health , 2004 .

[4]  K. Shepherd,et al.  Global soil characterization with VNIR diffuse reflectance spectroscopy , 2006 .

[5]  G. McCarty,et al.  The potential of diffuse reflectance spectroscopy for the determination of carbon inventories in soils. , 2002, Environmental pollution.

[6]  Jerome J. Workman,et al.  Near-infrared spectroscopy in agriculture , 2004 .

[7]  S. Wold,et al.  PLS-regression: a basic tool of chemometrics , 2001 .

[8]  C. Hurburgh,et al.  Near-Infrared Reflectance Spectroscopy–Principal Components Regression Analyses of Soil Properties , 2001 .

[9]  Leo Breiman,et al.  Bagging Predictors , 1996, Machine Learning.

[10]  R. Sakia The Box-Cox transformation technique: a review , 1992 .

[11]  T. J. Purakayastha,et al.  Carbon Sequestration in Native Prairie, Perennial Grass, No-Till, and Cultivated Palouse Silt Loam , 2008 .

[12]  Roger,et al.  Spectroscopy of Rocks and Minerals , and Principles of Spectroscopy , 2002 .

[13]  S. Livesley,et al.  Soil–atmosphere greenhouse gas exchange in a cool, temperate Eucalyptus delegatensis forest in south-eastern Australia , 2009 .

[14]  Tin Kam Ho,et al.  The Random Subspace Method for Constructing Decision Forests , 1998, IEEE Trans. Pattern Anal. Mach. Intell..

[15]  S. Recous,et al.  Soil moisture, carbon and nitrogen dynamics following incorporation and surface application of labelled crop residues in soil columns , 2006 .

[16]  Sabine Grunwald,et al.  Spectroscopic models of soil organic carbon in Florida, USA. , 2010, Journal of environmental quality.

[17]  Sabine Grunwald,et al.  Comparison of multivariate methods for inferential modeling of soil carbon using visible/near-infrared spectra , 2008 .

[18]  Alan Julian Izenman,et al.  Modern Multivariate Statistical Techniques , 2008 .

[19]  A. Walkley,et al.  AN EXAMINATION OF THE DEGTJAREFF METHOD FOR DETERMINING SOIL ORGANIC MATTER, AND A PROPOSED MODIFICATION OF THE CHROMIC ACID TITRATION METHOD , 1934 .

[20]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[21]  Abdul Mounem Mouazen,et al.  Special issue ‘Diffuse reflectance spectroscopy in soil science and land resource assessment’ , 2010 .

[22]  Lutgarde M. C. Buydens,et al.  Possibilities of visible–near-infrared spectroscopy for the assessment of soil contamination in river floodplains , 2001 .

[23]  B. Minasny,et al.  Regression rules as a tool for predicting soil properties from infrared reflectance spectroscopy , 2008 .

[24]  F. J. Stevenson HUmus Chemistry Genesis, Composition, Reactions , 1982 .

[25]  R Core Team,et al.  R: A language and environment for statistical computing. , 2014 .

[26]  Ramón Díaz-Uriarte,et al.  Gene selection and classification of microarray data using random forest , 2006, BMC Bioinformatics.

[27]  J. Duckworth Mathematical Data Preprocessing , 2015 .

[28]  Sabine Grunwald,et al.  Modeling of Soil Organic Carbon Fractions Using Visible–Near‐Infrared Spectroscopy , 2009 .

[29]  G. Vianello B.4a Field Book for describing and sampling soils , 2015 .

[30]  Rick L. Lawrence,et al.  Comparing local vs. global visible and near-infrared (VisNIR) diffuse reflectance spectroscopy (DRS) calibrations for the prediction of soil clay, organic C and inorganic C , 2008 .

[31]  H. Beecher,et al.  The potential of near-infrared reflectance spectroscopy for soil analysis — a case study from the Riverine Plain of south-eastern Australia , 2002 .

[32]  R. V. Rossel,et al.  Using data mining to model and interpret soil diffuse reflectance spectra. , 2010 .