Combining Partial Least Squares and the Gradient-Boosting Method for Soil Property Retrieval Using Visible Near-Infrared Shortwave Infrared Spectra

Soil spectroscopy has experienced a tremendous increase in soil property characterisation, and can be used not only in the laboratory but also from the space (imaging spectroscopy). Partial least squares (PLS) regression is one of the most common approaches for the calibration of soil properties using soil spectra. Besides functioning as a calibration method, PLS can also be used as a dimension reduction tool, which has scarcely been studied in soil spectroscopy. PLS components retained from high-dimensional spectral data can further be explored with the gradient-boosted decision tree (GBDT) method. Three soil sample categories were extracted from the Land Use/Land Cover Area Frame Survey (LUCAS) soil library according to the type of land cover (woodland, grassland, and cropland). First, PLS regression and GBDT were separately applied to build the spectroscopic models for soil organic carbon (OC), total nitrogen content (N), and clay for each soil category. Then, PLS-derived components were used as input variables for the GBDT model. The results demonstrate that the combined PLS-GBDT approach has better performance than PLS or GBDT alone. The relative important variables for soil property estimation revealed by the proposed method demonstrated that the PLS method is a useful dimension reduction tool for soil spectra to retain target-related information.

[1]  S.A. Dyer,et al.  Estimation of Soil Properties Using a Combination of Spectral and Scalar Sensor Data , 2006, 2006 IEEE Instrumentation and Measurement Technology Conference Proceedings.

[2]  Rich Caruana,et al.  An empirical evaluation of supervised learning in high dimensions , 2008, ICML '08.

[3]  J. M. Soriano-Disla,et al.  The Performance of Visible, Near-, and Mid-Infrared Reflectance Spectroscopy for Prediction of Soil Physical, Chemical, and Biological Properties , 2014 .

[4]  Lutgarde M. C. Buydens,et al.  Interpretation of variable importance in Partial Least Squares with Significance Multivariate Correlation (sMC) , 2014 .

[5]  R. V. Rossel,et al.  Determining the composition of mineral-organic mixes using UV–vis–NIR diffuse reflectance spectroscopy , 2006 .

[6]  Tie-Yan Liu,et al.  LightGBM: A Highly Efficient Gradient Boosting Decision Tree , 2017, NIPS.

[7]  Roman Rosipal,et al.  Overview and Recent Advances in Partial Least Squares , 2005, SLSFS.

[8]  Luca Montanarella,et al.  Prediction of Soil Organic Carbon at the European Scale by Visible and Near InfraRed Reflectance Spectroscopy , 2013, PloS one.

[9]  Yiyun Chen,et al.  Estimating Soil Organic Carbon Using VIS/NIR Spectroscopy with SVMR and SPA Methods , 2014, Remote. Sens..

[10]  Sabine Chabrillat,et al.  Imaging Spectrometry for Soil Applications , 2008 .

[11]  Tiezhu Shi,et al.  Prediction of low heavy metal concentrations in agricultural soils using visible and near-infrared reflectance spectroscopy , 2014 .

[12]  Luca Montanarella,et al.  Soil spectroscopy: an opportunity to be seized , 2015, Global change biology.

[13]  Manfred F. Buchroithner,et al.  Quantitative Retrieval of Organic Soil Properties from Visible Near-Infrared Shortwave Infrared (Vis-NIR-SWIR) Spectroscopy Using Fractal-Based Feature Extraction , 2016, Remote. Sens..

[14]  R. V. Rossel,et al.  Visible, near infrared, mid infrared or combined diffuse reflectance spectroscopy for simultaneous assessment of various soil properties , 2006 .

[15]  S. Engelsen,et al.  Interval Partial Least-Squares Regression (iPLS): A Comparative Chemometric Study with an Example from Near-Infrared Spectroscopy , 2000 .

[16]  Rich Caruana,et al.  An empirical comparison of supervised learning algorithms , 2006, ICML.

[17]  Panos Panagos,et al.  Prediction of soil organic carbon content by diffuse reflectance spectroscopy using a local partial least square regression approach , 2014 .

[18]  Bo Du,et al.  Deep Learning for Remote Sensing Data: A Technical Tutorial on the State of the Art , 2016, IEEE Geoscience and Remote Sensing Magazine.

[19]  Claudy Jolivet,et al.  Optimization criteria in sample selection step of local regression for quantitative analysis of large soil NIRS database , 2012 .

[20]  Gaël Varoquaux,et al.  Scikit-learn: Machine Learning in Python , 2011, J. Mach. Learn. Res..

[21]  Yi Zhang,et al.  Deep Embedding Forest: Forest-based Serving with Deep Embedding Features , 2017, KDD.

[22]  Thorsten Behrens,et al.  Distance and similarity-search metrics for use with soil vis-NIR spectra , 2013 .

[23]  Thomas Scholten,et al.  The spectrum-based learner: A new local approach for modeling soil vis–NIR spectra of complex datasets , 2013 .

[24]  R. V. Rossel,et al.  Visible and near infrared spectroscopy in soil science , 2010 .

[25]  M. Vohland,et al.  Comparing different multivariate calibration methods for the determination of soil organic carbon pools with visible to near infrared spectroscopy , 2011 .

[26]  Xiaoyu Song,et al.  Exploring the Best Hyperspectral Features for LAI Estimation Using Partial Least Squares Regression , 2014, Remote. Sens..

[27]  S T Roweis,et al.  Nonlinear dimensionality reduction by locally linear embedding. , 2000, Science.

[28]  J. Friedman Stochastic gradient boosting , 2002 .

[29]  A. Karnieli,et al.  Mapping of several soil properties using DAIS-7915 hyperspectral scanner data - a case study over clayey soils in Israel , 2002 .

[30]  Pietro Amenta,et al.  Prediction of Soil Properties with PLSR and vis-NIR Spectroscopy: Application to Mediterranean Soils from Southern Italy , 2012 .

[31]  Eyal Ben-Dor,et al.  Modelling Diverse Soil Attributes with Visible to Longwave Infrared Spectroscopy Using PLSR Employed by an Automatic Modelling Engine , 2017, Remote. Sens..

[32]  Eibe Frank,et al.  Accelerating the XGBoost algorithm using GPU computing , 2017, PeerJ Comput. Sci..

[33]  Mohammadmehdi Saberioon,et al.  A Memory-Based Learning Approach as Compared to Other Data Mining Algorithms for the Prediction of Soil Texture Using Diffuse Reflectance Spectra , 2016, Remote. Sens..

[34]  A. Boulesteix Statistical Applications in Genetics and Molecular Biology PLS Dimension Reduction for Classification with Microarray Data , 2011 .

[35]  J. Friedman Greedy function approximation: A gradient boosting machine. , 2001 .

[36]  J. Hill,et al.  Using Imaging Spectroscopy to study soil properties , 2009 .

[37]  Carlos Guestrin,et al.  XGBoost : Reliable Large-scale Tree Boosting System , 2015 .

[38]  Tarun Chopra,et al.  Fault Diagnosis in Benchmark Process Control System Using Stochastic Gradient Boosted Decision Trees , 2011 .

[39]  A. Höskuldsson PLS regression methods , 1988 .

[40]  Michael Vohland,et al.  Determination of soil properties with visible to near- and mid-infrared spectroscopy: Effects of spectral variable selection , 2014 .

[41]  R. C. Mittal,et al.  Dimensionality reduction of hyperspectral data using spectral fractal feature , 2012 .

[42]  Arwyn Jones,et al.  The LUCAS topsoil database and derived information on the regional variability of cropland topsoil properties in the European Union , 2013, Environmental Monitoring and Assessment.

[43]  James G. Shanahan,et al.  Location disambiguation in local searches using gradient boosted decision trees , 2010, GIS '10.

[44]  Zhou Shi,et al.  Development of a national VNIR soil-spectral library for soil classification and prediction of organic matter concentrations , 2014, Science China Earth Sciences.

[45]  Fulin Luo,et al.  Dimensionality reduction of hyperspectral images based on sparse discriminant manifold embedding , 2015 .

[46]  Tahir Mehmood,et al.  A review of variable selection methods in Partial Least Squares Regression , 2012 .

[47]  Xing Chen,et al.  Stacked Denoise Autoencoder Based Feature Extraction and Classification for Hyperspectral Images , 2016, J. Sensors.

[48]  Pascal Vincent,et al.  Stacked Denoising Autoencoders: Learning Useful Representations in a Deep Network with a Local Denoising Criterion , 2010, J. Mach. Learn. Res..

[49]  Sabine Chabrillat,et al.  Prediction of Common Surface Soil Properties Based on Vis-NIR Airborne and Simulated EnMAP Imaging Spectroscopy Data: Prediction Accuracy and Influence of Spatial Resolution , 2016, Remote. Sens..

[50]  Eyal Ben-Dor,et al.  Near-Infrared Analysis as a Rapid Method to Simultaneously Evaluate Several Soil Properties , 1995 .

[51]  Silong Peng,et al.  A New Method Combining LDA and PLS for Dimension Reduction , 2014, PloS one.

[52]  Keith D. Shepherd,et al.  Soil Spectroscopy: An Alternative to Wet Chemistry for Soil Monitoring , 2015 .

[53]  Eyal Ben-Dor,et al.  Agricultural Soil Spectral Response and Properties Assessment: Effects of Measurement Protocol and Data Mining Technique , 2017, Remote. Sens..

[54]  William S. Rayens,et al.  PLS and dimension reduction for classification , 2007, Comput. Stat..

[55]  Adrian Chappell,et al.  On the soil information content of visible–near infrared reflectance spectra , 2011 .

[56]  Jing Liu,et al.  Soil pH value, organic matter and macronutrients contents prediction using optical diffuse reflectance spectroscopy , 2015, Comput. Electron. Agric..

[57]  Viacheslav I. Adamchuk,et al.  A global spectral library to characterize the world’s soil , 2016 .

[58]  Jack Y. Yang,et al.  Feature Selection and Partial Least Squares Based Dimension Reduction for Tumor Classification , 2007, 2007 IEEE 7th International Symposium on BioInformatics and BioEngineering.