A Comparative Assessment of the Influences of Human Impacts on Soil Cd Concentrations Based on Stepwise Linear Regression, Classification and Regression Tree, and Random Forest Models

Soil cadmium (Cd) contamination has attracted a great deal of attention because of its detrimental effects on animals and humans. This study aimed to develop and compare the performances of stepwise linear regression (SLR), classification and regression tree (CART) and random forest (RF) models in the prediction and mapping of the spatial distribution of soil Cd and to identify likely sources of Cd accumulation in Fuyang County, eastern China. Soil Cd data from 276 topsoil (0–20 cm) samples were collected and randomly divided into calibration (222 samples) and validation datasets (54 samples). Auxiliary data, including detailed land use information, soil organic matter, soil pH, and topographic data, were incorporated into the models to simulate the soil Cd concentrations and further identify the main factors influencing soil Cd variation. The predictive models for soil Cd concentration exhibited acceptable overall accuracies (72.22% for SLR, 70.37% for CART, and 75.93% for RF). The SLR model exhibited the largest predicted deviation, with a mean error (ME) of 0.074 mg/kg, a mean absolute error (MAE) of 0.160 mg/kg, and a root mean squared error (RMSE) of 0.274 mg/kg, and the RF model produced the results closest to the observed values, with an ME of 0.002 mg/kg, an MAE of 0.132 mg/kg, and an RMSE of 0.198 mg/kg. The RF model also exhibited the greatest R2 value (0.772). The CART model predictions closely followed, with ME, MAE, RMSE, and R2 values of 0.013 mg/kg, 0.154 mg/kg, 0.230 mg/kg and 0.644, respectively. The three prediction maps generally exhibited similar and realistic spatial patterns of soil Cd contamination. The heavily Cd-affected areas were primarily located in the alluvial valley plain of the Fuchun River and its tributaries because of the dramatic industrialization and urbanization processes that have occurred there. The most important variable for explaining high levels of soil Cd accumulation was the presence of metal smelting industries. The good performance of the RF model was attributable to its ability to handle the non-linear and hierarchical relationships between soil Cd and environmental variables. These results confirm that the RF approach is promising for the prediction and spatial distribution mapping of soil Cd at the regional scale.

[1]  Paul J. Worsfold,et al.  Heavy metals in soils , 1995 .

[2]  Tao Chen,et al.  Heavy metal sources identification and sampling uncertainty analysis in a field-scale vegetable soil of Hangzhou, China. , 2009, Environmental pollution.

[3]  J. Drake,et al.  Modelling ecological niches with support vector machines , 2006 .

[4]  Yves Brostaux,et al.  Soil contamination near a former Zn-Pb ore-treatment plant: Evaluation of deterministic factors and spatial structures at the landscape scale , 2014 .

[5]  Wei-Yin Loh,et al.  Classification and regression trees , 2011, WIREs Data Mining Knowl. Discov..

[6]  J. Brian Gray,et al.  Introduction to Linear Regression Analysis , 2002, Technometrics.

[7]  V. Ettler Soil contamination near non-ferrous metal smelters: A review , 2016 .

[8]  O. Pourret,et al.  Assessment of soil metal distribution and environmental impact of mining in Katanga (Democratic Republic of Congo) , 2016 .

[9]  Xinbin Feng,et al.  Environmental contamination of heavy metals from zinc smelting areas in Hezhang County, western Guizhou, China. , 2006, Environment international.

[10]  Jammalamadaka Introduction to Linear Regression Analysis (3rd ed.) , 2003 .

[11]  K. Weathers,et al.  A comparison of three empirically based, spatially explicit predictive models of residential soil Pb concentrations in Baltimore, Maryland, USA: understanding the variability within cities , 2013, Environmental Geochemistry and Health.

[12]  Ruimin Liu,et al.  Heavy metals in urban soils with various types of land use in Beijing, China. , 2011, Journal of hazardous materials.

[13]  John H. Grove,et al.  Soil-landscape modeling across a physiographic region : Topographic patterns and model transportability , 2006 .

[14]  D. Houben,et al.  Leachability of cadmium, lead, and zinc in a long-term spontaneously revegetated slag heap: implications for phytostabilization , 2013, Journal of Soils and Sediments.

[15]  K. Weathers,et al.  The effects of the urban built environment on the spatial distribution of lead in residential soils. , 2012, Environmental pollution.

[16]  Nature and origin of multicomponent aerial emissions of the copper-nickel smelter complex. , 2002, Environment international.

[17]  Jinsheng Wang,et al.  Contamination features and health risk of soil heavy metals in China. , 2015, The Science of the total environment.

[18]  G. De’ath,et al.  CLASSIFICATION AND REGRESSION TREES: A POWERFUL YET SIMPLE TECHNIQUE FOR ECOLOGICAL DATA ANALYSIS , 2000 .

[19]  M. Kirkham CADMIUM IN PLANTS ON POLLUTED SOILS: EFFECTS OF SOIL FACTORS, HYPERACCUMULATION AND AMENDMENTS , 2006 .

[20]  Xuezheng Shi,et al.  Spatial interrelations and multi-scale sources of soil heavy metal variability in a typical urban-rural transition area in Yangtze River Delta region of China , 2010 .

[21]  I. Thornton,et al.  Urban Geochemistry: A study of the influence of anthropogenic activity on the heavy metal content of soils in traditionally industrial and non-industrial areas of Britain , 1996 .

[22]  R. Gieré,et al.  Open-pit coal-mining effects on rice paddy soil composition and metal bioavailability to Oryza sativa L. plants in Cam Pha, northeastern Vietnam , 2013, Environmental Science and Pollution Research.

[23]  J. Bacon,et al.  Isotopic characterisation of lead in contaminated soils from the vicinity of a non-ferrous metal smelter near Plovdiv, Bulgaria. , 2005, Environmental pollution.

[24]  K. Wang,et al.  Identification of soil heavy metal sources from anthropogenic activities and pollution assessment of Fuyang County, China , 2009, Environmental monitoring and assessment.

[25]  W. Grove Statistical Methods for Rates and Proportions, 2nd ed , 1981 .

[26]  D. R. Cutler,et al.  Utah State University From the SelectedWorks of , 2017 .

[27]  B. Shomar,et al.  On the quantitative relationships between environmental parameters and heavy metals pollution in Mediterranean soils using GIS regression-trees: The case study of Lebanon , 2014 .

[28]  Linsheng Yang,et al.  Characterizing spatial distribution and sources of heavy metals in the soils from mining-smelting activities in Shuikoushan, Hunan Province, China. , 2009, Journal of environmental sciences.

[29]  R. Rudnick,et al.  Composition of the Continental Crust , 2014 .

[30]  F. Moore,et al.  Characterization of metal pollution in soils under two landuse patterns in the Angouran region, NW Iran: a study based on multivariate data analysis. , 2009, Journal of hazardous materials.

[31]  Scott M. McLennan,et al.  Relationships between the trace element composition of sedimentary rocks and upper continental crust , 2001 .

[32]  Tommy Dalgaard,et al.  Spatial soil zinc content distribution from terrain parameters: a GIS-based decision-tree model in Lebanon. , 2010, Environmental pollution.

[33]  Mei He,et al.  Multivariate and geostatistical analyses of the spatial distribution and source of arsenic and heavy metals in the agricultural soils in Shunde, Southeast China , 2015 .

[34]  A. Hayes Principles and methods of toxicology , 1982 .

[35]  S. Lavorel,et al.  Terrestrial Ecosystems in a Changing World , 2007 .

[36]  Fenfang Lin,et al.  Assessing soil Cu content and anthropogenic influences using decision tree analysis. , 2008, Environmental pollution.

[37]  T. Sterckeman,et al.  Vertical distribution of Cd, Pb and Zn in soils near smelters in the North of France. , 2000, Environmental pollution.

[38]  Fenfang Lin,et al.  Variability of total and available copper concentrations in relation to land use and soil properties in Yangtze River Delta of China , 2009, Environmental monitoring and assessment.

[39]  Xiaoe Yang,et al.  Interactive effects of Cd and PAHs on contaminants removal from co-contaminated soil planted with hyperaccumulator plant Sedum alfredii , 2012, Journal of Soils and Sediments.

[40]  B. Minasny,et al.  On digital soil mapping , 2003 .

[41]  A. Bermond,et al.  Interactions between metals and soil organic matter in various particle size fractions of soil contaminated with waste water , 2009 .

[42]  Zhengwen Li,et al.  Bioavailability of Cd in a soil–rice system in China: soil type versus genotype effects , 2005, Plant and Soil.

[43]  C. Micó,et al.  Assessing heavy metal sources in agricultural soils of an European Mediterranean area by multivariate analysis. , 2006, Chemosphere.

[44]  B. Everitt,et al.  Statistical methods for rates and proportions , 1973 .

[45]  Zhi-Wei Liu,et al.  Digital mapping of soil organic matter for rubber plantation at regional scale: An application of random forest plus residuals kriging approach , 2015 .

[46]  Jerome O. Nriagu,et al.  A History of Global Metal Pollution , 1996, Science.

[47]  K. H. Wedepohl The Composition of the Continental Crust , 1995 .

[48]  D. Bui,et al.  A comparative assessment of support vector regression, artificial neural networks, and random forests for predicting and mapping soil organic carbon stocks across an Afromontane landscape. , 2015 .

[49]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.