Application of singular value decomposition (SVD) and semi-discrete decomposition (SDD) techniques in clustering of geochemical data: an environmental study in central Iran

Common multivariate clustering techniques are ineffective in identifying subtle patterns of correlation, and clustering of variables or samples within complex geochemical datasets. This study compares the combination of singular value decomposition (SVD) and semi discrete decomposition (SDD), with that of hierarchical cluster analysis (HCA), to examine patterns within a multielement soil geochemical dataset from an agricultural area in the vicinity of Pb–Zn mining operations in central Iran. SVD was used to both identify patterns of correlation between variables and samples and to “denoise” the data, and SDD to simultaneously cluster the samples and variables. The results reveal various spatial associations of mining waste-associated metals As, Ba, Pb and Zn, and within the remaining elements whose distribution is largely controlled by the major oxides. SVD–SDD was found to be superior to HCA, in its ability to detect subtle clusters in soil geochemistry indicative of mine-related contamination in the study area.

[1]  B. Everitt,et al.  Cluster Analysis: Everitt/Cluster Analysis , 2011 .

[2]  Anil K. Jain,et al.  Data clustering: a review , 1999, CSUR.

[3]  J. Aitchison,et al.  Logratio Analysis and Compositional Distance , 2000 .

[4]  A. Clare,et al.  A comparison of unsupervised neural networks and k-means clustering in the analysis of multi-element stream sediment data , 2001, Geochemistry: Exploration, Environment, Analysis.

[5]  K. Gaines,et al.  Trophic dynamics of U, Ni, Hg and other contaminants of potential concern on the Department of Energy’s Savannah River Site , 2013, Environmental Monitoring and Assessment.

[6]  Vipin Kumar UNDERSTANDING COMPLEX DATASETS: DATA MINING WITH MATRIX DECOMPOSITIONS , 2006 .

[7]  A. Zissimos,et al.  Reflections of the geological characteristics of Cyprus in soil rare earth element patterns , 2015 .

[8]  Eric Grunsky,et al.  Some aspects of transformations of compositional data and the identification of outliers , 1996 .

[9]  J. H. Ward Hierarchical Grouping to Optimize an Objective Function , 1963 .

[10]  Peter Filzmoser,et al.  Noname manuscript No. (will be inserted by the editor) Identification of local multivariate outliers , 2022 .

[11]  A. Mokhtari,et al.  Geochemical effects of deeply buried Cu–Au mineralization on transported regolith in an arid terrain , 2009 .

[12]  K. Baker,et al.  Singular Value Decomposition Tutorial , 2013 .

[13]  Werner Dubitzky,et al.  Data Mining Techniques in Grid Computing Environments , 2014 .

[14]  A. Zissimos,et al.  Anthropogenic versus lithological influences on soil geochemical patterns in Cyprus , 2012 .

[15]  Clemens Reimann,et al.  Interpretation of multivariate outliers for compositional data , 2012, Comput. Geosci..

[16]  David B. Skillicorn,et al.  Semidiscrete Decomposition: A Bump Hunting Technique , 2002, AusDM.

[17]  Willem J. Heiser,et al.  Two Purposes for Matrix Factorization: A Historical Appraisal , 2000, SIAM Rev..

[18]  A. Korre Statistical and spatial assessment of soil heavy metal contamination in areas of poorly recorded, complex sources of pollution , 1999 .

[19]  Tamara G. Kolda,et al.  Latent Semantic Indexing Via a Semi-Discrete Matrix Decomposition , 1999 .

[20]  Clemens Reimann,et al.  Statistical data analysis explained : applied environmental statics with R , 2008 .

[21]  Hongjin Ji,et al.  Semi-hierarchical correspondence cluster analysis and regional geochemical pattern recognition , 2007 .

[22]  Dianne P. O'Leary,et al.  Digital Image Compression by Outer Product Expansion , 1983, IEEE Trans. Commun..

[23]  Dianne P. O'Leary,et al.  The mathematics of information coding, extraction, and distribution , 1999 .

[24]  David C. Carslaw,et al.  Characterising and understanding emission sources using bivariate polar plots and k-means clustering , 2013, Environ. Model. Softw..

[25]  K. Rama Mohan,et al.  Assessment of heavy metal contamination in soils around chromite mining areas, Nuggihalli, Karnataka, India , 2013, Environmental Earth Sciences.

[26]  D. O’Leary,et al.  Computation and Uses of the Semidiscrete Matrix Decomposition , 1999 .

[27]  Y. F. Alghalandis,et al.  The application of geochemical pattern recognition to regional prospecting: A case study of the Sana , 2011 .

[28]  A. Mokhtari,et al.  A comparison of fractal methods and probability plots in identifying and mapping soil metal contamination near an active mining area, Iran. , 2013, The Science of the total environment.

[29]  D. Cohen,et al.  Optimization of partial extraction chemistry for buffered acetate and hydroxylamine leaches , 2005, Geochemistry: Exploration, Environment, Analysis.

[30]  L. Sipos,et al.  Hydrochemical characterization of arsenic contaminated alluvial aquifers in Eastern Croatia using multivariate statistical techniques and arsenic risk assessment. , 2012, The Science of the total environment.

[31]  H. Ghrefat,et al.  Using multivariate statistical analyses to evaluate groundwater contamination in the northwestern part of Saudi Arabia , 2013, Environmental Earth Sciences.

[32]  Tamara G. Kolda,et al.  A semidiscrete matrix decomposition for latent semantic indexing information retrieval , 1998, TOIS.

[33]  Brian Everitt,et al.  Cluster analysis , 1974 .

[34]  D. Botstein,et al.  Singular value decomposition for genome-wide expression data processing and modeling. , 2000, Proceedings of the National Academy of Sciences of the United States of America.

[35]  P. Filzmoser,et al.  Wahrscheinlichkeitstheorie Cluster analysis applied to regional geochemical data : Problems and possibilities , 2006 .

[36]  Wojciech Szpankowski,et al.  Semi-discrete matrix transforms (SDD) for image and video compression , 2002, Proceedings DCC 2002. Data Compression Conference.

[37]  M. Sarstedt,et al.  A Concise Guide to Market Research , 2019, Springer Texts in Business and Economics.

[38]  D. Kalman A Singularly Valuable Decomposition: The SVD of a Matrix , 1996 .

[39]  Robert H. McNutt,et al.  Genesis of sediment-hosted Zn-Pb-Ba deposits in the Irankuh District, Esfahan area, west-central Iran , 1994 .

[40]  M. Islam,et al.  Apportionment of heavy metals in soil and vegetables and associated health risks assessment , 2015, Stochastic Environmental Research and Risk Assessment.

[41]  G. W. Stewart,et al.  On the Early History of the Singular Value Decomposition , 1993, SIAM Rev..

[42]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[43]  David B. Skillicorn Finding Unusual Correlation Using Matrix Decompositions , 2004, ISI.

[44]  C. Poschenrieder,et al.  Arsenic and heavy metal contamination of soil and vegetation around a copper mine in Northern Peru , 1997 .

[45]  A. Manuela Gonçalves,et al.  Clustering and forecasting of dissolved oxygen concentration on a river basin , 2011 .

[46]  R. Anderson,et al.  Application of discriminant analysis with clustered data to determine anthropogenic metals contamination. , 2009, The Science of the total environment.

[47]  Pavel Berkhin,et al.  A Survey of Clustering Data Mining Techniques , 2006, Grouping Multidimensional Data.

[48]  Rui Xu,et al.  Survey of clustering algorithms , 2005, IEEE Transactions on Neural Networks.

[49]  D. B. Skillicornfmcconnell Outlier Detection Using SemiDiscrete Decomposition , 2002 .

[50]  R. Cattell The Scree Test For The Number Of Factors. , 1966, Multivariate behavioral research.

[51]  Clemens Reimann,et al.  Factor analysis applied to regional geochemical data: problems and possibilities , 2002 .

[52]  A. Mokhtari,et al.  Metal speciation in agricultural soils adjacent to the Irankuh Pb-Zn mining area, central Iran , 2015 .

[53]  Luis Mateus Rocha,et al.  Singular value decomposition and principal component analysis , 2003 .