Using ensemble models to identify and apportion heavy metal pollution sources in agricultural soils on a local scale.

This study aims to identify and apportion multi-source and multi-phase heavy metal pollution from natural and anthropogenic inputs using ensemble models that include stochastic gradient boosting (SGB) and random forest (RF) in agricultural soils on the local scale. The heavy metal pollution sources were quantitatively assessed, and the results illustrated the suitability of the ensemble models for the assessment of multi-source and multi-phase heavy metal pollution in agricultural soils on the local scale. The results of SGB and RF consistently demonstrated that anthropogenic sources contributed the most to the concentrations of Pb and Cd in agricultural soils in the study region and that SGB performed better than RF.

[1]  M. Rosenbaum,et al.  GIS techniques for mapping and evaluating sources and distribution of heavy metal contaminants , 1998, Geological Society, London, Engineering Geology Special Publications.

[2]  S. Qi,et al.  Heavy metals in agricultural soils of the Pearl River Delta, South China. , 2002, Environmental pollution.

[3]  Glenn De ' ath BOOSTED TREES FOR ECOLOGICAL MODELING AND PREDICTION , 2007 .

[4]  Rick L. Lawrence,et al.  Classification of remotely sensed imagery using stochastic gradient boosting as a refinement of classification tree analysis , 2004 .

[5]  E. Steinnes,et al.  Heavy Metal Pollution by Atmospheric Transport in Natural Soils from the Northern Part of Eastern Carpathians , 2000 .

[6]  S. Dampare,et al.  Heavy metal pollution of coal mine-affected agricultural soils in the northern part of Bangladesh. , 2010, Journal of hazardous materials.

[7]  I. Mateo,et al.  A comparison of statistical methods to standardize catch-per-unit-effort of the Alaska longline sablefish fishery , 2014 .

[8]  A. Facchinelli,et al.  Multivariate statistical and GIS-based approach to identify heavy metal sources in soils. , 2001, Environmental pollution.

[9]  Kaimin Shih,et al.  Assessing heavy metal pollution in the surface soils of a region that had undergone three decades of intense industrialization and urbanization , 2013, Environmental Science and Pollution Research.

[10]  Niklaus E. Zimmermann,et al.  Predicting tree species presence and basal area in Utah: A comparison of stochastic gradient boosting, generalized additive models, and tree-based methods , 2006 .

[11]  Feng Zhou,et al.  Spatial distribution of heavy metals in Hong Kong's marine sediments and their human impacts: a GIS-based chemometric approach. , 2007, Marine pollution bulletin.

[12]  C. Micó,et al.  Assessing heavy metal sources in agricultural soils of an European Mediterranean area by multivariate analysis. , 2006, Chemosphere.

[13]  G. Singh,et al.  Heavy metal contamination and its indexing approach for river water , 2010 .

[14]  Klaus Nordhausen,et al.  The Elements of Statistical Learning: Data Mining, Inference, and Prediction, Second Edition by Trevor Hastie, Robert Tibshirani, Jerome Friedman , 2009 .

[15]  J. Friedman Greedy function approximation: A gradient boosting machine. , 2001 .

[16]  Robin Genuer,et al.  Random Forests: some methodological insights , 2008, 0811.3619.

[17]  L. Chai,et al.  Identifying sources and assessing potential risk of heavy metals in soils from direct exposure to children in a mine-impacted city, Changsha, China. , 2010, Journal of Environmental Quality.

[18]  Hefa Cheng,et al.  Application of stochastic models in identification and apportionment of heavy metal pollution sources in the surface soils of a large-scale region. , 2013, Environmental science & technology.

[19]  J. Friedman Stochastic gradient boosting , 2002 .

[20]  Jerome H. Friedman,et al.  Recent Advances in Predictive (Machine) Learning , 2006, J. Classif..

[21]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[22]  Bai-You Cheng,et al.  Combining a finite mixture distribution model with indicator kriging to delineate and map the spatial patterns of soil heavy metal pollution in Chunghua County, central Taiwan. , 2010, Environmental pollution.

[23]  F. Moore,et al.  Statistical Analysis of Accumulation and Sources of Heavy Metals Occurrence in Agricultural Soils of Khoshk River Banks, Shiraz, Iran , 2007 .

[24]  Yu-Shan Shih,et al.  Variable selection bias in regression trees with constant fits , 2004, Comput. Stat. Data Anal..

[25]  M. Key Phytoremediation of heavy metal polluted soils and water:Progresses and perspectives , 2008 .

[26]  K. Hornik,et al.  Unbiased Recursive Partitioning: A Conditional Inference Framework , 2006 .

[27]  Han Yongming,et al.  Multivariate analysis of heavy metal contamination in urban dusts of Xi'an, Central China. , 2006, The Science of the total environment.

[28]  R. Boaventura,et al.  Sediments as monitors of heavy metal contamination in the Ave river basin (Portugal): multivariate analysis of data. , 1999, Environmental pollution.

[29]  Lars Jarup,et al.  Hazards of heavy metal contamination. , 2003 .

[30]  Chaofeng Shao,et al.  Heavy Metal Contamination Assessment and Partition for Industrial and Mining Gathering Areas , 2014, International journal of environmental research and public health.

[31]  I. Gergen,et al.  Heavy metals health risk assessment for population via consumption of vegetables grown in old mining area; a case study: Banat County, Romania , 2011, Chemistry Central journal.

[32]  Ramón Díaz-Uriarte,et al.  Gene selection and classification of microarray data using random forest , 2006, BMC Bioinformatics.

[33]  K. Wang,et al.  Identification of soil heavy metal sources from anthropogenic activities and pollution assessment of Fuyang County, China , 2009, Environmental monitoring and assessment.

[34]  Trevor Hastie,et al.  The Elements of Statistical Learning , 2001 .

[35]  Qihang Wu,et al.  Heavy metal contamination of soil and water in the vicinity of an abandoned e-waste recycling site: implications for dissemination of heavy metals. , 2015, The Science of the total environment.

[36]  Balwant Singh,et al.  Heavy Metals Contamination in Vegetables Grown in Urban and Metal Smelter Contaminated Sites in Australia , 2006 .

[37]  A. Kabata-Pendias Trace elements in soils and plants , 1984 .

[38]  L. Breiman USING ADAPTIVE BAGGING TO DEBIAS REGRESSIONS , 1999 .

[39]  Matthias Schmid,et al.  Applying additive modelling and gradient boosting to assess the effects of watershed and reach characteristics on riverine assemblages , 2012 .

[40]  Niklaus E. Zimmermann,et al.  Investigating the regional-scale pattern of agricultural land abandonment in the Swiss mountains: A spatial statistical modelling approach , 2007 .

[41]  Achim Zeileis,et al.  Bias in random forest variable importance measures: Illustrations, sources and a solution , 2007, BMC Bioinformatics.

[42]  H. Ishwaran Variable importance in binary regression trees and forests , 2007, 0711.2434.

[43]  R. Goel,et al.  Heavy Metal Pollution: Source, Impact, and Remedies , 2011 .

[44]  Zhi Dang,et al.  Soil Heavy Metal Pollution Around the Dabaoshan Mine, Guangdong Province, China , 2007 .

[45]  John Bell,et al.  A review of methods for the assessment of prediction errors in conservation presence/absence models , 1997, Environmental Conservation.

[46]  M. Zovko,et al.  Soil contamination by trace metals: Geochemical behaviour as an element of risk assessment , 2011 .

[47]  Yong-guan Zhu,et al.  Health risks of heavy metals in contaminated soils and food crops irrigated with wastewater in Beijing, China. , 2008, Environmental pollution.

[48]  F. Sun,et al.  Comparing the health risk of toxic metals through vegetable consumption between industrial polluted and non-polluted fields in Shaoguan, south China , 2012 .

[49]  Robert Tibshirani,et al.  The Elements of Statistical Learning: Data Mining, Inference, and Prediction, 2nd Edition , 2001, Springer Series in Statistics.

[50]  Tslalom Haileslassie,et al.  Hazards Of Heavy Metal Contamination In Ground Water , 2015 .