Spectroscopic Diagnosis of Arsenic Contamination in Agricultural Soils

This study investigated the abilities of pre-processing, feature selection and machine-learning methods for the spectroscopic diagnosis of soil arsenic contamination. The spectral data were pre-processed by using Savitzky-Golay smoothing, first and second derivatives, multiplicative scatter correction, standard normal variate, and mean centering. Principle component analysis (PCA) and the RELIEF algorithm were used to extract spectral features. Machine-learning methods, including random forests (RF), artificial neural network (ANN), radial basis function- and linear function- based support vector machine (RBF- and LF-SVM) were employed for establishing diagnosis models. The model accuracies were evaluated and compared by using overall accuracies (OAs). The statistical significance of the difference between models was evaluated by using McNemar’s test (Z value). The results showed that the OAs varied with the different combinations of pre-processing, feature selection, and classification methods. Feature selection methods could improve the modeling efficiencies and diagnosis accuracies, and RELIEF often outperformed PCA. The optimal models established by RF (OA = 86%), ANN (OA = 89%), RBF- (OA = 89%) and LF-SVM (OA = 87%) had no statistical difference in diagnosis accuracies (Z < 1.96, p < 0.05). These results indicated that it was feasible to diagnose soil arsenic contamination using reflectance spectroscopy. The appropriate combination of multivariate methods was important to improve diagnosis accuracies.

[1]  W. R. Horwath,et al.  NIR and DRIFT-MIR spectrometry of soils for predicting soil and crop parameters in a flooded field , 2003, Plant and Soil.

[2]  Guo Wang,et al.  Soil arsenic availability and the transfer of soil arsenic to crops in suburban areas in Fujian Province, southeast China. , 2006, The Science of the total environment.

[3]  Guofeng Wu,et al.  Soil Organic Carbon Content Estimation with Laboratory-Based Visible–Near-Infrared Reflectance Spectroscopy: Feature Selection , 2014, Applied spectroscopy.

[4]  Ralf Wieland,et al.  Classification in conservation biology: A comparison of five machine-learning methods , 2010, Ecol. Informatics.

[5]  R. V. Rossel,et al.  Using data mining to model and interpret soil diffuse reflectance spectra. , 2010 .

[6]  Leo Breiman,et al.  Bagging Predictors , 1996, Machine Learning.

[7]  Thorsten Behrens,et al.  Digital soil mapping using artificial neural networks , 2005 .

[8]  Freek D. van der Meer,et al.  Mapping of heavy metal pollution in stream sediments using combined geochemistry, field spectroscopy, and hyperspectral remote sensing: A case study of the Rodalquilar mining area, SE Spain , 2008 .

[9]  Isabelle Guyon,et al.  An Introduction to Variable and Feature Selection , 2003, J. Mach. Learn. Res..

[10]  W S McCulloch,et al.  A logical calculus of the ideas immanent in nervous activity , 1990, The Philosophy of Artificial Intelligence.

[11]  Lutgarde M. C. Buydens,et al.  Possibilities of visible–near-infrared spectroscopy for the assessment of soil contamination in river floodplains , 2001 .

[12]  Determination of Arsenic in National Standard Reference Soil and Stream Sediment Samples by Atomic Fluorescence Spectrometry , 2009 .

[13]  Giorgio Matteucci,et al.  Effect of calibration set size on prediction at local scale of soil carbon by Vis-NIR spectroscopy , 2017 .

[14]  Zou Xiaobo,et al.  Variables selection methods in near-infrared spectroscopy. , 2010, Analytica chimica acta.

[15]  Jin Zhang,et al.  An overview and comparison of machine-learning techniques for classification purposes in digital soil mapping , 2016 .

[16]  Xin Min Wu,et al.  Feasibility of reflectance spectroscopy for the assessment of soil mercury contamination. , 2005, Environmental science & technology.

[17]  M. Vohland,et al.  Comparing different multivariate calibration methods for the determination of soil organic carbon pools with visible to near infrared spectroscopy , 2011 .

[18]  Guofeng Wu,et al.  Monitoring arsenic contamination in agricultural soils with reflectance spectroscopy of rice plants. , 2014, Environmental science & technology.

[19]  S. Santra,et al.  Arsenic in foodchain and community health risk: a study in Gangetic West Bengal , 2013 .

[20]  J. Ji,et al.  Can Contaminant Elements in Soils Be Assessed by Remote Sensing Technology: A Case Study With Simulated Data , 2011 .

[21]  M. C. U. Araújo,et al.  The successive projections algorithm for variable selection in spectroscopic multicomponent analysis , 2001 .

[22]  Zhihao Qin,et al.  Possibilities of reflectance spectroscopy for the assessment of contaminant elements in suburban soils , 2005 .

[23]  Christoph Emmerling,et al.  Determination of total soil organic C and hot water‐extractable C from VIS‐NIR soil reflectance with partial least squares regression and spectral feature selection techniques , 2011 .

[24]  Thomas Kemper,et al.  Estimate of heavy metal contamination in soils after a mining accident using reflectance spectroscopy. , 2002, Environmental science & technology.

[25]  Tao Chen,et al.  Rapid identification of soil cadmium pollution risk at regional scale based on visible and near-infrared spectroscopy. , 2015, Environmental pollution.

[26]  อนิรุธ สืบสิงห์,et al.  Data Mining Practical Machine Learning Tools and Techniques , 2014 .

[27]  Guofeng Wu,et al.  Comparison of multivariate methods for estimating soil total nitrogen with visible/near-infrared spectroscopy , 2012, Plant and Soil.

[28]  G. Foody Thematic map comparison: Evaluating the statistical significance of differences in classification accuracy , 2004 .

[29]  Vladimir N. Vapnik,et al.  The Nature of Statistical Learning Theory , 2000, Statistics for Engineering and Information Science.

[30]  Lucila Ohno-Machado,et al.  Small, fuzzy and interpretable gene expression based classifiers , 2005, Bioinform..

[31]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[32]  Stefan Sommer,et al.  Use of airborne hyperspectral data to estimate residual heavy metal contamination and acidification potential in the Guadiamar floodplain Andalusia, Spain after the Aznacollar mining accident , 2004, SPIE Remote Sensing.

[33]  K. Loska,et al.  Metal contamination of farming soils affected by industry. , 2004, Environment international.

[34]  Alex B. McBratney,et al.  Diagnostic Screening of Urban Soil Contaminants Using Diffuse Reflectance Spectroscopy , 2009 .

[35]  R. V. Rossel,et al.  Visible, near infrared, mid infrared or combined diffuse reflectance spectroscopy for simultaneous assessment of various soil properties , 2006 .

[36]  Guofeng Wu,et al.  Visible and near-infrared reflectance spectroscopy-an alternative for monitoring soil contamination by heavy metals. , 2014, Journal of hazardous materials.

[37]  Kun Tan,et al.  Estimation of heavy metal concentrations in reclaimed mining soils using reflectance spectroscopy. , 2014, Guang pu xue yu guang pu fen xi = Guang pu.

[38]  D. Qiu,et al.  Estimation of As and Cu Contamination in Agricultural Soils Around a Mining Area by Reflectance Spectroscopy: A Case Study , 2009 .

[39]  L. Hoffmann,et al.  Measuring soil organic carbon in croplands at regional scale using airborne imaging spectroscopy , 2010 .

[40]  Tiezhu Shi,et al.  Prediction of low heavy metal concentrations in agricultural soils using visible and near-infrared reflectance spectroscopy , 2014 .

[41]  Graham J. Williams,et al.  Rattle: A Data Mining GUI for R , 2009, R J..

[42]  Vladimir Vapnik,et al.  The Nature of Statistical Learning , 1995 .

[43]  Thomas C. Edwards,et al.  Machine learning for predicting soil classes in three semi-arid landscapes , 2015 .

[44]  William J. Welch,et al.  Computer-aided design of experiments , 1981 .