Predicting and Mapping of Soil Organic Carbon Using Machine Learning Algorithms in Northern Iran

Estimation of the soil organic carbon content is of utmost importance in understanding the chemical, physical, and biological functions of the soil. This study proposes machine learning algorithms of support vector machines, artificial neural networks, regression tree, random forest, extreme gradient boosting, and conventional deep neural network for advancing prediction models of SOC. Models are trained with 1879 composite surface soil samples, and 105 auxiliary data as predictors. The genetic algorithm is used as a feature selection approach to identify effective variables. The results indicate that precipitation is the most important predictor driving 15 percent of SOC spatial variability followed by the normalized difference vegetation index, day temperature index of moderate resolution imaging spectroradiometer, multiresolution valley bottom flatness and land use, respectively. Based on 10 fold cross validation, the DNN model reported as a superior algorithm with the lowest prediction error and uncertainty. In terms of accuracy, DNN yielded a mean absolute error of 59 percent, a root mean squared error of 75 percent, a coefficient of determination of 0.65, and Lins concordance correlation coefficient of 0.83. The SOC content was the highest in udic soil moisture regime class with mean values of 4 percent, followed by the aquic and xeric classes, respectively. Soils in dense forestlands had the highest SOC contents, whereas soils of younger geological age and alluvial fans had lower SOC. The proposed DNN is a promising algorithm for handling large numbers of auxiliary data at a province scale, and due to its flexible structure and the ability to extract more information from the auxiliary data surrounding the sampled observations, it had high accuracy for the prediction of the SOC baseline map and minimal uncertainty.

[1]  Thomas F. A. Bishop,et al.  Driving factors of soil organic carbon fractions over New South Wales, Australia , 2019, Geoderma.

[2]  Tianqi Chen,et al.  XGBoost: A Scalable Tree Boosting System , 2016, KDD.

[3]  L. Lombardo,et al.  Modelling the topsoil carbon stock of agricultural lands with the Stochastic Gradient Treeboost in a semi-arid Mediterranean region , 2017 .

[4]  Budiman Minasny,et al.  Digital mapping of soil carbon , 2013 .

[5]  Martin Kehl,et al.  Development and magnetic properties of loess-derived forest soils along a precipitation gradient in northern Iran , 2019, Journal of Mountain Science.

[6]  N. Nagelkerke,et al.  A note on a general definition of the coefficient of determination , 1991 .

[7]  Carolina Fernandes,et al.  Estimation of soil organic matter content by modeling with artificial neural networks , 2019, Geoderma.

[8]  C. Ochoa-Martínez,et al.  Prediction of mass transfer kinetics during osmotic dehydration of apples using neural networks , 2007 .

[9]  Robert P. Griffiths,et al.  The effects of topography on forest soil characteristics in the Oregon Cascade Mountains (USA): Implications for the effects of climate change on soil properties , 2009 .

[10]  R. V. Rossel,et al.  Using data mining to model and interpret soil diffuse reflectance spectra. , 2010 .

[11]  Hong-tao Zhang,et al.  The Application of Support Vector Machine (SVM) Regression Method in Tunnel Fires , 2018 .

[12]  Alfred E. Hartemink,et al.  Digital Mapping of Topsoil Carbon Content and Changes in the Driftless Area of Wisconsin, USA , 2015 .

[13]  Abdolrassoul Salmanmahiny,et al.  Predicting soil organic carbon density using auxiliary environmental variables in northern Iran , 2016 .

[14]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[15]  Budiman Minasny,et al.  Digital soil mapping of soil carbon at the farm scale: A spatial downscaling approach in consideration of measured and uncertain data , 2017 .

[16]  Sabine Grunwald,et al.  Assessing uncertainty in soil organic carbon modeling across a highly heterogeneous landscape , 2015 .

[17]  Martin Kehl,et al.  Soil formation in loess-derived soils along a subhumid to humid climate gradient, Northeastern Iran , 2012 .

[18]  Jamshid Dehmeshki,et al.  Genetic algorithm based feature selection combined with dual classification for the automated detection of proliferative diabetic retinopathy , 2015, Comput. Medical Imaging Graph..

[19]  Feng Liu,et al.  Recent progress and future prospect of digital soil mapping: A review , 2017 .

[20]  Qingyun Du,et al.  Combining Multi-Source Data and Machine Learning Approaches to Predict Winter Wheat Yield in the Conterminous United States , 2020, Remote. Sens..

[21]  M. L. Mora,et al.  Natural nanoclays: applications and future trends – a chilean perspective , 2009, Clay Minerals.

[22]  C. Willmott,et al.  Advantages of the mean absolute error (MAE) over the root mean square error (RMSE) in assessing average model performance , 2005 .

[23]  Budiman Minasny,et al.  Using deep learning for digital soil mapping , 2018, SOIL.

[24]  Zhe Xu,et al.  Deep Learning Application for Predicting Soil Organic Matter Content by VIS-NIR Spectroscopy , 2019, Comput. Intell. Neurosci..

[25]  Oguz Kaynar,et al.  Multiple regression, ANN (RBF, MLP) and ANFIS models for prediction of swell potential of clayey soils , 2010, Expert Syst. Appl..

[26]  J. Hicke,et al.  Remote sensing of the terrestrial carbon cycle: A review of advances over 50 years , 2019, Remote Sensing of Environment.

[27]  Wei-Yin Loh,et al.  Classification and regression trees , 2011, WIREs Data Mining Knowl. Discov..

[28]  Geoffrey E. Hinton,et al.  Reducing the Dimensionality of Data with Neural Networks , 2006, Science.

[29]  Feng Liu,et al.  Modeling spatio-temporal distribution of soil moisture by deep learning-based cellular automata model , 2016, Journal of Arid Land.

[30]  Heng Li,et al.  Deep Learning of Subsurface Flow via Theory-guided Neural Network , 2019, ArXiv.

[31]  Joshua B. Tenenbaum,et al.  Learning with Hierarchical-Deep Models , 2013, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[32]  Alfred E. Hartemink,et al.  Total soil organic carbon and carbon sequestration potential in Nigeria , 2016 .

[33]  Fan-Rui Meng,et al.  Using artificial neural network models to produce soil organic carbon content distribution maps across landscapes , 2010 .

[34]  Rattan Lal,et al.  Predicting the spatial variation of the soil organic carbon pool at a regional scale. , 2010 .

[35]  Thorsten Behrens,et al.  Improving the Spatial Prediction of Soil Organic Carbon Content in Two Contrasting Climatic Regions by Stacking Machine Learning Models and Rescanning Covariate Space , 2020, Remote. Sens..

[36]  Enrique Ortiz,et al.  Interpolation of Mexican soil properties at a scale of 1:1,000,000 , 2014 .

[37]  Lei Deng,et al.  Soil organic carbon density and its driving factors in forest ecosystems across a northwestern province in China , 2019, Geoderma.

[38]  Farhad Khormali,et al.  Carbon stock and mineral factors controlling soil organic carbon in a climatic gradient, Golestan province , 2012 .

[39]  J M Matías,et al.  Soil Cd, Cr, Cu, Ni, Pb and Zn sorption and retention models using SVM: Variable selection and competitive model. , 2017, The Science of the total environment.

[40]  Tim Appelhans,et al.  Evaluating machine learning approaches for the interpolation of monthly air temperature at Mt. Kilimanjaro, Tanzania , 2015 .

[41]  Keith Paustian,et al.  Soil organic carbon as an indicator of environmental quality at the national scale: Inventory monitoring methods and policy relevance , 2005 .

[42]  Mostafa Emadi,et al.  Assessment of some soil properties by spatial variability in saline and sodic soils in Arsanjan plain, Southern Iran. , 2008, Pakistan journal of biological sciences : PJBS.

[43]  Ruhollah Taghizadeh-Mehrjardi,et al.  Spatial prediction of soil organic carbon using machine learning techniques in western Iran , 2020, Geoderma Regional.

[44]  Gangcai Liu,et al.  Spatial prediction of soil organic matter content integrating artificial neural network and ordinary kriging in Tibetan Plateau , 2014 .

[45]  Wesley P. Calixto,et al.  Review: Calculation of soil electrical conductivity using a genetic algorithm , 2010 .

[46]  R. Reese Geostatistics for Environmental Scientists , 2001 .

[47]  De Li Liu,et al.  High resolution mapping of soil organic carbon stocks using remote sensing variables in the semi-arid rangelands of eastern Australia. , 2018, The Science of the total environment.

[48]  Budiman Minasny,et al.  Multi-source data integration for soil mapping using deep learning , 2018, SOIL.

[49]  Matthew F. McCabe,et al.  A hybrid training approach for leaf area index estimation via Cubist and random forests machine-learning , 2018 .

[50]  R. Kerry,et al.  Digital mapping of soil organic carbon at multiple depths using different data mining techniques in Baneh region, Iran , 2016 .

[51]  J. Triantafilis,et al.  Assessing soil organic carbon stocks under land-use change scenarios using random forest models , 2019, Carbon Management.

[52]  Fabio Veronesi,et al.  Comparison between geostatistical and machine learning models as predictors of topsoil organic carbon with a focus on local uncertainty estimation , 2019, Ecological Indicators.

[53]  I. J. McGowen,et al.  Estimating soil organic carbon stocks using different modelling techniques in the semi-arid rangelands of eastern Australia , 2018 .

[54]  Karim Gabsi,et al.  Modular Feed Forward Networks to Predict Sugar Diffusivity from Date Pulp Part I. Model Validation , 2011 .

[55]  Budiman Minasny,et al.  Mapping continuous depth functions of soil carbon storage and available water capacity , 2009 .

[56]  S. Ayoubi,et al.  Feature Selection Using Parallel Genetic Algorithm for the Prediction of Geometric Mean Diameter of Soil Aggregates by Machine Learning Methods , 2014 .

[57]  C. Nickerson A note on a concordance correlation coefficient to evaluate reproducibility , 1997 .

[58]  K. Shepherd,et al.  Development of Reflectance Spectral Libraries for Characterization of Soil Properties , 2002 .

[59]  S. Reichman,et al.  Geochemical indices and regression tree models for estimation of ambient background concentrations of copper, chromium, nickel and zinc in soil. , 2018, Chemosphere.

[60]  Budiman Minasny,et al.  More Data or a Better Model? Figuring Out What Matters Most for the Spatial Prediction of Soil Carbon , 2017 .

[61]  Budi Setiawan,et al.  Digital mapping for cost-effective and accurate prediction of the depth and carbon stocks in Indonesian peatlands , 2016 .

[62]  E. Ben-Dor,et al.  Laboratory, field and airborne spectroscopy for monitoring organic carbon content in agricultural soils , 2007 .

[63]  Mostafa Emadi,et al.  Geostatistics-based spatial distribution of soil moisture and temperature regime classes in Mazandaran province, northern Iran , 2016 .

[64]  Mogens Humlekrog Greve,et al.  Selection of terrain attributes and its scale dependency on soil organic carbon prediction , 2019, Geoderma.

[65]  Ebrahim Amiri,et al.  Calibration and Testing of the Aquacrop Model for Rice under Water and Nitrogen Management , 2016 .

[66]  Behrooz Pourmohammadali,et al.  Effects of soil properties, water quality and management practices on pistachio yield in Rafsanjan region, southeast of Iran , 2019, Agricultural Water Management.

[67]  José Alexandre Melo Demattê,et al.  Soil texture and organic carbon mapping using surface temperature and reflectance spectra in Southeast Brazil , 2018, Geoderma Regional.

[68]  Cristiano Ballabio,et al.  Spatial prediction of soil properties in temperate mountain regions using support vector regression , 2009 .

[69]  H. Jenny Factors of Soil Formation: A System of Quantitative Pedology , 2011 .

[70]  Bunkei Matsushita,et al.  Sensitivity of the Enhanced Vegetation Index (EVI) and Normalized Difference Vegetation Index (NDVI) to Topographic Effects: A Case Study in High-Density Cypress Forest , 2007, Sensors.

[71]  F. Castaldi,et al.  Estimation of soil organic carbon in arable soil in Belgium and Luxembourg with the LUCAS topsoil database , 2018 .

[72]  Ruhollah Taghizadeh-Mehrjardi,et al.  Assessing and monitoring the soil quality of forested and agricultural areas using soil-quality indices and digital soil-mapping in a semi-arid environment , 2018 .

[73]  M. Rabenhorst,et al.  Organic carbon dynamics in soils of Mid-Atlantic barrier island landscapes , 2019, Geoderma.

[74]  J. Gallant,et al.  A multiresolution index of valley bottom flatness for mapping depositional areas , 2003 .

[75]  Chao Liang,et al.  Soil type recognition as improved by genetic algorithm-based variable selection using near infrared spectroscopy and partial least squares discriminant analysis , 2015, Scientific Reports.

[76]  O. Edenhofer,et al.  Renewable Energy Sources and Climate Change Mitigation , 2011 .

[77]  H. Shirani,et al.  Determining the features influencing physical quality of calcareous soils in a semiarid region of Iran using a hybrid PSO-DT algorithm , 2015 .

[78]  Dominique Arrouays,et al.  Spatial distribution of soil organic carbon stocks in France: Discussion paper , 2010 .

[79]  Peter Finke,et al.  Digital mapping of soil properties using multiple machine learning in a semi-arid region, central Iran , 2019, Geoderma.

[80]  Budiman Minasny,et al.  Challenges for Soil Organic Carbon Research , 2014 .

[81]  Max Kuhn,et al.  Building Predictive Models in R Using the caret Package , 2008 .

[82]  Ming Xu,et al.  Conventional and digital soil mapping in Iran: Past, present, and future , 2020 .

[83]  Feng Liu,et al.  Comparison of boosted regression tree and random forest models for mapping topsoil organic carbon concentration in an alpine ecosystem , 2016 .

[84]  Cyrill Stachniss,et al.  WeedMap: A large-scale semantic weed mapping framework using aerial multispectral imaging and deep neural network for precision farming , 2018, Remote. Sens..

[85]  Charless C. Fowlkes,et al.  Do We Need More Training Data? , 2015, International Journal of Computer Vision.

[86]  Somayeh Tajik,et al.  Induction of phenolic and flavonoid compounds in leaves of saffron (Crocus sativus L.) by salicylic acid , 2019, Scientia Horticulturae.

[87]  Mostafa Emadi,et al.  Effect of land-use change on soil fertility characteristics within water-stable aggregates of two cultivated soils in northern Iran , 2009 .

[88]  Andreas Kamilaris,et al.  Deep learning in agriculture: A survey , 2018, Comput. Electron. Agric..

[89]  Junliang Fan,et al.  Comparison of Support Vector Machine and Extreme Gradient Boosting for predicting daily global solar radiation using temperature and precipitation in humid subtropical climates: A case study in China , 2018 .

[90]  A-Xing Zhu,et al.  Multi-scale digital terrain analysis and feature selection for digital soil mapping , 2010 .

[91]  R. Lal Soil carbon sequestration to mitigate climate change , 2004 .

[92]  A. Zhu,et al.  Mapping soil organic matter using the topographic wetness index: A comparative study based on different flow-direction algorithms and kriging methods , 2010 .

[93]  Stephen E. Fick,et al.  WorldClim 2: new 1‐km spatial resolution climate surfaces for global land areas , 2017 .

[94]  Benoît Mercatoris,et al.  Prediction of organic potato yield using tillage systems and soil properties by artificial neural network (ANN) and multiple linear regressions (MLR) , 2019, Soil and Tillage Research.

[95]  Bai Zhang,et al.  Comparison of object-based and pixel-based Random Forest algorithm for wetland vegetation mapping using high spatial resolution GF-1 and SAR data , 2017 .

[96]  Congcong Li,et al.  Stacked Autoencoder-based deep learning for remote-sensing image classification: a case study of African land-cover mapping , 2016 .

[97]  Thomas F. A. Bishop,et al.  Factors Controlling Soil Organic Carbon Stocks with Depth in Eastern Australia , 2015 .

[98]  Ruiying Chang,et al.  Temperature drive the altitudinal change in soil carbon and nitrogen of montane forests: Implication for global warming , 2019, CATENA.

[99]  D. Bui,et al.  A comparative assessment of support vector regression, artificial neural networks, and random forests for predicting and mapping soil organic carbon stocks across an Afromontane landscape. , 2015 .

[100]  Lalit Kumar,et al.  Digital soil mapping algorithms and covariates for soil organic carbon mapping and their implications: A review , 2019, Geoderma.

[101]  Yubin Lan,et al.  Review: Development of soft computing and applications in agricultural and biological engineering , 2010 .

[102]  Patrizia Busato,et al.  Machine Learning in Agriculture: A Review , 2018, Sensors.

[103]  Mostafa Emadi,et al.  Changes in soil inorganic phosphorous pools along a precipitation gradient in northern Iran , 2012 .

[104]  John Triantafilis,et al.  Predicting and mapping of soil particle‐size fractions with adaptive neuro‐fuzzy inference and ant colony optimization in central Iran , 2016 .

[105]  Ruhollah Taghizadeh-Mehrjardi,et al.  Artificial bee colony feature selection algorithm combined with machine learning algorithms to predict vertical and lateral distribution of soil organic matter in South Dakota, USA , 2017 .

[106]  Budiman Minasny,et al.  Using deep learning to predict soil properties from regional spectral data , 2019, Geoderma Regional.