Advantages of fuzzy k-means over k-means clustering in the classification of diffuse reflectance soil spectra: A case study with West African soils

Abstract The amount of data in soil science increased at exponential rates over the last decades, promoted by rapid technological innovation. This development led to a better understanding of processes but also required the introduction of data mining into soil science. With diffuse reflectance Fourier transform (DRIFT) spectroscopy, one of those new methods, soil scientist could build up large spectral libraries. These libraries can expand over large, heterogeneous areas requiring classification algorithms to find subsets or patterns in the data prior to further analysis. The k-means algorithm has become one of the most frequently used algorithms for this task. However, fuzzy k-means (FKM) clustering, a fuzzy variation of k-means, is potentially better suited for spectral data. Fuzzy logic allows for class overlaps and is supposed to reflect the complex nature of soil spectra and continuous environmental variables. In this study, we collected over 1000 mid-infrared DRIFT spectra of agricultural soils from the West African savannah zone and clustered the data using k-means and FKM. Our aim was to explore the feasibility of centroid-based cluster algorithms in finding substructures in spectral data and to discuss the benefits of fuzzy clustering. We found a two-group pattern separating the data set in a northern and southern part. The clustering could primarily be explained by geology and climatic gradients. While both algorithms performed similarly well in picking up the structure, FKM could reveal a transition zone between the two clusters that was not detectable with k-means. This transition zone was explained by a gradual change in aeolian dust deposition, topography, and a change in geology. With this study, we showed the benefits of fuzzy clustering over traditional hard clustering for finding substructure in unexplored spectral data. We recommend the use of continuous classes, as they incorporate more information that could potentially improve subsequent analysis.

[1]  J. Coates Interpretation of Infrared Spectra, A Practical Approach , 2006 .

[2]  J. M. Soriano-Disla,et al.  The Performance of Visible, Near-, and Mid-Infrared Reflectance Spectroscopy for Prediction of Soil Physical, Chemical, and Biological Properties , 2014 .

[3]  James C. Bezdek,et al.  Pattern Recognition with Fuzzy Objective Function Algorithms , 1981, Advanced Applications in Pattern Recognition.

[4]  Anil Kumar Gupta,et al.  A Comparative study Between Fuzzy Clustering Algorithm and Hard Clustering Algorithm , 2014, ArXiv.

[5]  R. V. Rossel,et al.  In situ measurements of soil colour, mineral composition and clay content by vis–NIR spectroscopy , 2009 .

[6]  Alex B. McBratney,et al.  Design of optimal sample spacings for mapping soil using fuzzy-k-means and regionalized variable theory , 1990 .

[7]  Alex B. McBratney,et al.  Soil pattern recognition with fuzzy-c-means : application to classification and soil-landform interrelationships , 1992 .

[8]  H. Breuning‐madsen,et al.  Harmattan dust deposition and particle size in Ghana , 2005 .

[9]  Alex B. McBratney,et al.  Using a legacy soil sample to develop a mid-IR spectral library , 2008 .

[10]  Sabine Grunwald,et al.  Effects of Subsetting by Carbon Content, Soil Order, and Spectral Classification on Prediction of Soil Total Carbon with Diffuse Reflectance Spectroscopy , 2012 .

[11]  T. Nguyen,et al.  Diffuse reflectance infrared Fourier transform (DRIFT) spectroscopy in soil studies , 1991 .

[12]  Daniel Zízala,et al.  Assessment of Soil Degradation by Erosion Based on Analysis of Soil Properties Using Aerial Hyperspectral Images and Ancillary Data, Czech Republic , 2017, Remote. Sens..

[13]  Michael Greenacre,et al.  Correspondence Analysis in R, with Two- and Three-dimensional Graphics: The ca Package , 2007 .

[14]  R. Barnes,et al.  Standard Normal Variate Transformation and De-Trending of Near-Infrared Diffuse Reflectance Spectra , 1989 .

[15]  K. Shepherd,et al.  Evaluating the utility of mid-infrared spectral subspaces for predicting soil properties , 2016, Chemometrics and intelligent laboratory systems : an international journal sponsored by the Chemometrics Society.

[16]  M. Gerzabek,et al.  Comparison of the composition of forest soil litter derived from three different sites at various decompositional stages using FTIR spectroscopy , 1998 .

[17]  J. Deckers,et al.  The soil Atlas of Africa , 2011 .

[18]  A. McBratney,et al.  A continuum approach to soil classification by modified fuzzy k‐means with extragrades , 1992 .

[19]  H. W. Van der Marel,et al.  Atlas of Infrared Spectroscopy of Clay Minerals and Their Admixtures , 1976 .

[20]  Brian C. Smith Fundamentals of Fourier Transform Infrared Spectroscopy , 1995 .

[21]  James B. Reeves,et al.  Near- versus mid-infrared diffuse reflectance spectroscopy for soil analysis emphasizing carbon and laboratory versus on-site analysis: Where are we and what needs to be done? , 2010 .

[22]  Freek D. van der Meer,et al.  Spectral characteristics of clay minerals in the 2.5-14 μm wavelength region , 2011 .

[23]  A. Savitzky,et al.  Smoothing and Differentiation of Data by Simplified Least Squares Procedures. , 1964 .

[24]  P. Griffiths,et al.  Angular Dependence of Diffuse Reflectance Infrared Spectra. Part II: Effect of Polarization , 1987 .

[25]  Bo Stenberg,et al.  Improving the prediction performance of a large tropical vis‐NIR spectroscopic soil library from Brazil by clustering into smaller subsets or use of data mining calibration techniques , 2014 .

[26]  G. McCarty,et al.  Mid-Infrared and Near-Infrared Diffuse Reflectance Spectroscopy for Soil Carbon Measurement , 2002 .

[27]  Budiman Minasny,et al.  On digital soil mapping , 2003 .

[28]  A. McBratney,et al.  Application of fuzzy sets in soil science: fuzzy logic, fuzzy measurements and fuzzy decisions , 1997 .

[29]  R. Clarke,et al.  Theory and Applications of Correspondence Analysis , 1985 .

[30]  J. Bezdek,et al.  FCM: The fuzzy c-means clustering algorithm , 1984 .

[31]  Alex B. McBratney,et al.  Fuzzy classification of soil profiles and horizons from the Lockyer Valley, Queensland, Australia , 1992 .

[32]  R Core Team,et al.  R: A language and environment for statistical computing. , 2014 .

[33]  R. V. Rossel,et al.  Using data mining to model and interpret soil diffuse reflectance spectra. , 2010 .

[34]  P.F.M. van Gaans,et al.  Continuous classification in soil survey: spatial correlation, confusion and boundaries , 1997 .

[35]  B. Sreedhar,et al.  In situ FTIR study on the dehydration of natural goethite , 2006 .

[36]  A. McBratney,et al.  Near-infrared (NIR) and mid-infrared (MIR) spectroscopic techniques for assessing the amount of carbon stock in soils – Critical review and research perspectives , 2011 .

[37]  D.J.J. Walvoort,et al.  Continuous soil maps - a fuzzy set approach to bridge the gap between aggregation levels of process and distribution models , 1997 .

[38]  Bin Li,et al.  Soil mapping via diffuse reflectance spectroscopy based on variable indicators: An ordered predictor selection approach , 2018 .

[39]  A. R. Mermut,et al.  Deposition of Harmattan dust and its influence on base saturation of soils in northern Ghana , 1991 .

[40]  Abbas Rammal,et al.  Classification of lignocellulosic biomass by weighted‐covariance factor fuzzy C‐means clustering of mid‐infrared and near‐infrared spectra , 2017 .

[41]  Zeynel Cebeci,et al.  Comparison of K-Means and Fuzzy C-Means Algorithms on Different Cluster Structures , 2015 .

[42]  V. Häring,et al.  Characteristics of urban and peri-urban agriculture in West Africa: results of an exploratory survey conducted in Tamale (Ghana) and Ouagadougou (Burkina Faso). , 2015 .

[43]  Frans van den Berg,et al.  Review of the most common pre-processing techniques for near-infrared spectra , 2009 .

[44]  Marina Meila,et al.  Comparing clusterings: an axiomatic view , 2005, ICML.

[45]  Stephen E. Fick,et al.  WorldClim 2: new 1‐km spatial resolution climate surfaces for global land areas , 2017 .

[46]  Sueli Aparecida Mingoti,et al.  Comparing SOM neural network with Fuzzy c , 2006, Eur. J. Oper. Res..

[47]  L. Hubert,et al.  Comparing partitions , 1985 .

[48]  Viacheslav I. Adamchuk,et al.  A global spectral library to characterize the world’s soil , 2016 .

[49]  J. MacQueen Some methods for classification and analysis of multivariate observations , 1967 .

[50]  Henning Buddenbaum,et al.  Fine spatial resolution mapping of soil organic matter quality in a Histosol profile , 2014 .

[51]  J. A. Gadsden Infrared Spectra of Minerals and Related Inorganic Compounds , 1975 .

[52]  T. Bernhardsen Geographic Information Systems: An Introduction , 1999 .