Comparison of Kohonen's Self-Organizing Map algorithm and principal component analysis in the exploratory data analysis of a groundwater quality dataset

Regional monitoring of groundwater chemistry yields large, multivariate data sets. Summarizing available data, extracting useful information and formulating hypotheses for further research are the key aspects in the exploratory data analysis of these data sets. Traditionally multivariate statistical techniques such as principal component analysis (PCA) are applied for this purpose. In PCA a linear dimensionality reduction of the original, high dimensional data set is carried out in order to identify orthogonal directions (principal components) of maximum variance in the dataset based on linear combinations of correlated variables [3]. In this study, PCA is compared to the Self-Organizing Map (SOM) algorithm. The SOM-algorithm is a neural network designed to carry out a non-parametric regression process in order to represent high-dimensional, nonlinearly related data items in a topology-preserving, often two-dimensional display, and to perform unsupervised classification and clustering [11]. PCA and SOM are applied to a groundwater chemistry data set from a regional monitoring network in two sandy, phreatic aquifers in Central Belgium. The 47 monitoring wells are each equipped with three well screens at different depths, in which 14 variables are measured. Both techniques succeed in distinguishing between both aquifers and reveal the apparent relationships between variables. The main advantage of PCA is the expression of each variable in terms of the principal components and the quantification of the amount of variance explained by each component. The visualization of the SOM-analysis has the advantage of allowing a straightforward interpretation of the structure of the data set in which even non-linear relationships can be identified. Additionally, the SOM-algorithm can handle a limited amount of missing values in the data set, contrary to PCA.

[1]  Shahrokh Rouhani,et al.  Multivariate geostatistical approach to space‐time data analysis , 1990 .

[2]  Keith Turner,et al.  Evaluation of graphical and multivariate statistical methods for classification of water chemistry data , 2002 .

[3]  Fernando Bação,et al.  Exploratory data analysis and clustering of multivariate spatial hydrogeological data by means of GEO3DSOM, a variant of Kohonen's Self-Organizing Map , 2006 .

[4]  Esa Alhoniemi,et al.  Self-organizing map in Matlab: the SOM Toolbox , 1999 .

[5]  P Laga,et al.  Paleogene and Neogene lithostratigrafic units (Belgium) , 2002 .

[6]  Alfred Ultsch,et al.  The architecture of emergent self-organizing maps to reduce projection errors , 2005, ESANN.

[7]  A. Dassargues,et al.  Hydrology and Earth System Sciences Discussions Exploratory Data Analysis and Clustering of Multivariate Spatial Hydrogeological Data by Means of Geo3dsom, a Variant of Kohonen's Self-organizing Map , 2022 .

[8]  Yoon-Seok Timothy Hong,et al.  Intelligent characterisation and diagnosis of the groundwater quality in an urban fractured-rock aquifer using an artificial neural network , 2001 .

[9]  D. Papamichail,et al.  Statistical and trend analysis of water quality and quantity data for the Strymon River in Greece , 2001 .

[10]  FRANCISCO SÁNCHEZ-MARTOS,et al.  Assessment of Groundwater Quality by Means of Self-Organizing Maps: Application in a Semiarid Area , 2002, Environmental management.

[11]  Gwo-Fong Lin,et al.  Time series forecasting by combining the radial basis function network and the self‐organizing map , 2005 .

[12]  B. Helena,et al.  Temporal evolution of groundwater composition in an alluvial aquifer (Pisuerga river, Spain) by Principal Component Analysis. , 2000 .

[13]  J. Join,et al.  Using principal components analysis and Na/Cl ratios to trace groundwater circulation in a volcanic island: the example of Reunion , 1997 .

[14]  J. Cruz,et al.  Hydrogeochemistry of thermal and mineral water springs of the Azores archipelago (Portugal) , 2006 .

[15]  H. A. Stiff The Interpretation of Chemical Water Analysis by Means of Patterns , 1951 .

[16]  Teuvo Kohonen,et al.  Self-Organizing Maps , 2010 .

[17]  I. Gibson Statistics and Data Analysis in Geology , 1976, Mineralogical Magazine.

[18]  Timothy B. Spruill,et al.  Statistical evaluation of effects of riparian buffers on nitrate and ground water quality. , 2000 .

[19]  Arthur M. Piper,et al.  A graphic procedure in the geochemical interpretation of water-analyses , 1944 .