Comparison of machine learning algorithms for soil salinity predictions in three dryland oases located in Xinjiang Uyghur Autonomous Region (XJUAR) of China

ABSTRACT Many different machine learning approaches have been applied for various purposes. However, there has been limited guidance regarding which, if any, machine learning models and covariate sets might be optimal for predicting soil salinity across different oases in the Xinjiang Uyghur Autonomous Region (XJUAR) of China. This study aimed to compare five machine learning algorithms, Least Absolute Shrinkage and Selection Operator (LASSO), Multiple Adaptive Regression Splines (MARS), Classification and Regression Trees (CART), Random Forest tree ensembles (RF), and Stochastic Gradient Treeboost (SGT), to predict soil salinity in three geographically distinct areas (the Qitai, Kuqa, and Yutian oases). A total of 21 data sets from three oases were used to evaluate the performance of the algorithm and to screen the optimal variables. The results show the following indices are considered to be important indicators for quantitative assessment of soil salinity: EEVI, CSRI, EVI2, GDVI, SAIO, and SIT. Comparison results show that SGT is the most suitable algorithm for predicting soil salinity in arid areas. This study provides a comprehensive comparison of machine learning techniques for soil salinity prediction and may assist in the modeling and variable selection of digital soil mapping in the XJUAR of China.

[1]  H. Elsenbeer,et al.  Soil organic carbon concentrations and stocks on Barro Colorado Island — Digital soil mapping using Random Forests analysis , 2008 .

[2]  Joachim Hill,et al.  Modeling and Mapping of Soil Salinity with Reflectance Spectroscopy and Landsat Data Using Two Quantitative Methods (PLSR and MARS) , 2014, Remote. Sens..

[3]  V. L. Mulder,et al.  The use of remote sensing in soil and terrain mapping — A review , 2011 .

[4]  Luan Fu-ming Impacts of Regional Topographic Factors on Spatial Distribution of Soil Salinization in Qitai Oasis , 2011 .

[5]  L. Lombardo,et al.  Modelling the topsoil carbon stock of agricultural lands with the Stochastic Gradient Treeboost in a semi-arid Mediterranean region , 2017 .

[6]  Carolin Strobl,et al.  Adaptive Selection of Extra Cutpoints — Towards Reconciling Robustness and Interpretability in Classification Trees , 2009 .

[7]  Boubaker Dhehibi,et al.  Mapping soil salinity changes using remote sensing in Central Iraq , 2014 .

[8]  A. Huete A soil-adjusted vegetation index (SAVI) , 1988 .

[9]  S. Schnabel,et al.  Using and comparing two nonparametric methods (CART and MARS) to model the potential distribution of gullies , 2009 .

[10]  Max Kuhn,et al.  Building Predictive Models in R Using the caret Package , 2008 .

[11]  Fei Zhang,et al.  Monitoring Soil Salinization in Keriya River Basin, Northwestern China Using Passive Reflective and Active Microwave Remote Sensing Data , 2015, Remote. Sens..

[12]  Weicheng Wu,et al.  The Generalized Difference Vegetation Index (GDVI) for Dryland Characterization , 2014, Remote. Sens..

[13]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[14]  S. Tarantola,et al.  Designing a spectral index to estimate vegetation water content from remote sensing data: Part 2. Validation and applications , 2002 .

[15]  Min-Yuan Cheng,et al.  Accurately predicting building energy performance using evolutionary multivariate adaptive regression splines , 2014, Appl. Soft Comput..

[16]  Michael Märker,et al.  Water erosion susceptibility mapping by applying Stochastic Gradient Treeboost to the Imera Meridionale River Basin (Sicily, Italy) , 2016 .

[17]  Rob Jamieson,et al.  Evaluation of statistical models for predicting Escherichia coli particle attachment in fluvial systems. , 2013, Water research.

[18]  Lalit Kumar,et al.  Assessing soil salinity using soil salinity and vegetation indices derived from IKONOS high-spatial resolution imageries: Applications in a date palm dominated region , 2014 .

[19]  Zhang Fei,et al.  Analysis on characteristics of soil salinization in the delta oasis of Weigan and Kuqa Rivers , 2007 .

[20]  Alexander Brenning,et al.  Quantifying dwarf shrub biomass in an arid environment: comparing empirical methods in a high dimensional setting , 2015 .

[21]  C. Siebe,et al.  Mapping soil salinity using a combined spectral response index for bare soil and vegetation: A case study in the former lake Texcoco, Mexico , 2006 .

[22]  Abbey F. Wick,et al.  Soil Salinity: A Threat to Global Food Security , 2016 .

[23]  Andy Liaw,et al.  Classification and Regression by randomForest , 2007 .

[24]  Adrian V. Rocha,et al.  Advantages of a two band EVI calculated from solar and photosynthetically active radiation fluxes , 2009 .

[25]  H. Erdoğan,et al.  Status of the World ’ s Soil Resources , 2015 .

[26]  J. Martinez Beltran,et al.  Overview of salinity problems in the world and FAO strategies to address the problem , 2005 .

[27]  J. A. Schell,et al.  Monitoring vegetation systems in the great plains with ERTS , 1973 .

[28]  Yohei Sato,et al.  Assessment of hydrosaline land degradation by using a simple approach of remote sensing indicators , 2005 .

[29]  J. Freidman,et al.  Multivariate adaptive regression splines , 1991 .

[30]  Yuan Yao,et al.  Variable selection method for fault isolation using least absolute shrinkage and selection operator (LASSO) , 2015 .

[31]  Feng Liu,et al.  Comparison of boosted regression tree and random forest models for mapping topsoil organic carbon concentration in an alpine ecosystem , 2016 .

[32]  Elif Sertel,et al.  Monitoring soil salinity via remote sensing technology under data scarce conditions: A case study from Turkey , 2017 .

[33]  Youpeng Xu,et al.  Analysis of salinization dynamics by remote sensing in Hetao Irrigation District of North China , 2010 .

[34]  Budiman Minasny,et al.  Digital mapping of soil salinity in Ardakan region, central Iran , 2014 .

[35]  M. Hardisky The Influence of Soil Salinity, Growth Form, and Leaf Moisture on-the Spectral Radiance of Spartina alterniflora Canopies , 2008 .

[36]  Tian Yuan,et al.  Relationships between soil salinization and spectra in the delta oasis of Weigan and Kuqa Rivers. , 2009 .

[37]  Margaret G. Schmidt,et al.  Predictive soil parent material mapping at a regional-scale: a Random Forest approach. , 2014 .

[38]  C. Jordan Derivation of leaf-area index from quality of light on the forest floor , 1969 .

[39]  T. Skaggs,et al.  Regional-scale soil salinity assessment using Landsat ETM + canopy reflectance , 2015 .

[40]  Leo Breiman,et al.  Bagging Predictors , 1996, Machine Learning.

[41]  R. D. Ramsey,et al.  Landsat Spectral Data for Digital Soil Mapping , 2008 .

[42]  F. Underwood,et al.  A preliminary spatial assessment of risk: Marine birds and chronic oil pollution on Canada's Pacific coast. , 2016, The Science of the total environment.

[43]  Jianli Ding,et al.  Monitoring and evaluating spatial variability of soil salinity in dry and wet seasons in the Werigan–Kuqa Oasis, China, using remote sensing and electromagnetic induction instruments , 2014 .

[44]  J A Doolittle,et al.  Regional-scale assessment of soil salinity in the Red River Valley using multi-year MODIS EVI and NDVI. , 2010, Journal of environmental quality.

[45]  师庆东 Shi Qingdong,et al.  Landscape classification system based on climate, landform, ecosystem:a case study of Xinjiang area , 2014 .

[46]  J. Friedman Stochastic gradient boosting , 2002 .

[47]  Ian H. Witten,et al.  Data mining: practical machine learning tools and techniques with Java implementations , 2002, SGMD.

[48]  Graciela Metternicht,et al.  Remote sensing of soil salinity: potentials and constraints , 2003 .

[49]  Tashpolat Tiyip,et al.  A soil quality assessment under different land use types in Keriya river basin, Southern Xinjiang, China , 2015 .

[50]  R. Tibshirani Regression Shrinkage and Selection via the Lasso , 1996 .

[51]  田长彦 Tian Changyan,et al.  Study on key technologies of ecological management of saline alkali land in arid area of Xinjiang , 2016 .

[52]  Seyed Amir Naghibi,et al.  A Comparative Assessment Between Three Machine Learning Models and Their Performance Comparison by Bivariate and Multivariate Statistical Methods in Groundwater Potential Mapping , 2015, Water Resources Management.

[53]  Yong Pang,et al.  Characterizing forest canopy structure with lidar composite metrics and machine learning , 2011 .

[54]  Wang Fei The spatial variability of salt content based on river basin scale:a case study of the delta oasis in Weigan-Kuqa Watershed , 2010 .

[55]  Jin Zhang,et al.  An overview and comparison of machine-learning techniques for classification purposes in digital soil mapping , 2016 .

[56]  Adriaan van Niekerk,et al.  Machine learning performance for predicting soil salinity using different combinations of geomorphometric covariates , 2017 .

[57]  T. Skaggs,et al.  Regional scale soil salinity evaluation using Landsat 7, western San Joaquin Valley, California, USA , 2014 .

[58]  Bin Zhao,et al.  Using hyperspectral vegetation indices as a proxy to monitor soil salinity , 2010 .

[59]  R. Kerry,et al.  Digital mapping of soil organic carbon at multiple depths using different data mining techniques in Baneh region, Iran , 2016 .

[60]  Adriaan van Niekerk,et al.  Identification of WorldView-2 spectral and spatial factors in detecting salt accumulation in cultivated fields , 2016 .

[61]  Geping Luo,et al.  Mapping of regional soil salinities in Xinjiang and strategies for amelioration and management , 2015, Chinese Geographical Science.

[62]  R. D. Ramsey,et al.  Digitally Mapping Gypsic and Natric Soil Areas Using Landsat ETM Data , 2007 .

[63]  A. Altman,et al.  Plant responses to drought, salinity and extreme temperatures: towards genetic engineering for stress tolerance , 2003, Planta.

[64]  Trevor Hastie,et al.  Regularization Paths for Generalized Linear Models via Coordinate Descent. , 2010, Journal of statistical software.

[65]  Zhou Shi,et al.  Estimating soil salinity from remote sensing and terrain data in southern Xinjiang Province, China , 2019, Geoderma.

[66]  Robert P. Sheridan,et al.  Random Forest: A Classification and Regression Tool for Compound Classification and QSAR Modeling , 2003, J. Chem. Inf. Comput. Sci..

[67]  Ian H. Witten,et al.  Data mining - practical machine learning tools and techniques, Second Edition , 2005, The Morgan Kaufmann series in data management systems.

[68]  Christian Walter,et al.  Detecting salinity hazards within a semiarid context by means of combining soil and remote-sensing data , 2006 .

[69]  John A Kairalla,et al.  GLUMIP 2.0: SAS/IML Software for Planning Internal Pilots. , 2008, Journal of statistical software.

[70]  L Gong Soil Salinity Characteristic and Its Determinant Factors at Different Soil Types in Oasis of Extreme Arid Region , 2015 .

[71]  S. Gage,et al.  Temporal and spatial variation of a winter soundscape in south-central Alaska , 2016, Landscape Ecology.

[72]  Hamid Reza Pourghasemi,et al.  Erratum to: Landslide susceptibility mapping using random forest, boosted regression tree, classification and regression tree, and general linear models and comparison of their performance at Wadi Tayyah Basin, Asir Region, Saudi Arabia , 2016, Landslides.

[73]  S. Tarantola,et al.  Designing a spectral index to estimate vegetation water content from remote sensing data: Part 1 - Theoretical approach , 2002 .

[74]  A. Huete,et al.  Overview of the radiometric and biophysical performance of the MODIS vegetation indices , 2002 .

[75]  A. Brenning Spatial prediction models for landslide hazards: review, comparison and evaluation , 2005 .

[76]  Gretchen G. Moisen,et al.  Comparing five modelling techniques for predicting forest characteristics , 2002 .