Non-linear visualization and analysis of large water quality data sets: a model-free basis for efficient monitoring and risk assessment

Environmental monitoring programs provide large multivariate data sets that usually cover considerable spatial and temporal variabilities. The apparent complexity of these data sets requires sophisticated tools for their processing. Usually, fixed schemes are followed, including the application of numerical models, which are increasingly implemented in decision support systems. However, these schemes are too rigid with respect to detecting unexpected features, like the onset of subtle trends, non-linear relationships or patterns that are restricted to limited sub-samples of the total data set. In this study, an alternative approach is followed. It is based on an efficient non-linear visualization of the data. Visualization is the most powerful interface between computer and human brain. The idea is to apply an efficient and model-free tool, meaning without the necessity of prior assumptions about key properties of the data, such as dominant processes. In other words, processing of the data aimed at preserving a maximum amount of information and to leave it to the expert which features to analyze in more detail. A comprehensive data set from a 15-year monitoring program in the Lehstenbach watershed was used. The watershed is located in the Fichtelgebirge area, a mountainous region in South Germany, where land-use is forestry. Streamwater and groundwater have been monitored at 38 sampling sites, comprising 13 parameters. The data set was analyzed using a self-organizing map (SOM), combined with Sammon’s mapping. The 2D non-linear projection represented 89% of the variance of the data set. The visualization of the data set enabled an easy detection of outliers, assessing spatial versus temporal variance, and verifying a predefined classification of the sampling sites. Contamination of two of the observation wells was detected. Long-term trends of solute concentration in the catchment runoff could be differentiated from short-term dynamics, and a long-term shift in the dynamics was determined for different flow regimes individually. This analysis helped considerably to better understand the system’s behavior, to detect “hot spots” and to organize subsequent analyses of the data in a very efficient way.

[1]  François Anctil,et al.  Characterization of soil moisture conditions at temporal scales from a few days to annual , 2004 .

[2]  Cüneyt Güler,et al.  Sequential Analysis of Hydrochemical Data for Watershed Characterization , 2004, Ground water.

[3]  H. Lange,et al.  Dynamics of Runoff and Runoff Chemistry at the Lehstenbach and Steinkreuz Catchment , 2004 .

[4]  Gunnar Lischeid,et al.  Investigating short-term dynamics and long-term trends of SO4 in the runoff of a forested catchment using artificial neural networks , 2001 .

[5]  R. Abrahart,et al.  Comparing neural network and autoregressive moving average techniques for the provision of continuous river flow forecasts in two contrasting catchments , 2000 .

[6]  FRANCISCO SÁNCHEZ-MARTOS,et al.  Assessment of Groundwater Quality by Means of Self-Organizing Maps: Application in a Semiarid Area , 2002, Environmental management.

[7]  T. Kohonen Self-organized formation of topographically correct feature maps , 1982 .

[8]  Fernando Bação,et al.  Exploratory data analysis and clustering of multivariate spatial hydrogeological data by means of GEO3DSOM, a variant of Kohonen's Self-Organizing Map , 2006 .

[9]  Kuolin Hsu,et al.  Self‐organizing linear output map (SOLO): An artificial neural network suitable for hydrologic modeling and analysis , 2002 .

[10]  A. Gámez,et al.  Nonlinear dimensionality reduction in climate data , 2004 .

[11]  Shie-Yui Liong,et al.  Advance flood forecasting for flood stricken Bangladesh with a fuzzy reasoning method (Copies of English Papers by the Center Staff Published in the Fiscal Year of 1999) , 2000 .

[12]  L. Aquilina,et al.  Mineralogical sources of the buffer capacity in a granite catchment determined by strontium isotopes , 2008 .

[13]  Teuvo Kohonen,et al.  Self-Organizing Maps , 2010 .

[14]  Teuvo Kohonen,et al.  Self-organized formation of topologically correct feature maps , 2004, Biological Cybernetics.

[15]  E. Matzner Biogeochemistry of Forested Catchments in a Changing Environment , 2004, Ecological Studies.

[16]  A. Malik,et al.  WATER QUALITY ASSESSMENT AND APPORTIONMENT OF POLLUTION SOURCES OF GOMTI RIVER(INDIA) USING MULTIVARIATE STATISTICAL TECHNIQUES- A CASE STUDY , 2005 .

[17]  Bernhard Westrich,et al.  Processes governing river water quality identified by principal component analysis , 2002 .

[18]  John W. Sammon,et al.  A Nonlinear Mapping for Data Structure Analysis , 1969, IEEE Transactions on Computers.

[19]  Its'hak Dinstein,et al.  On the Initialisation of Sammon’s Nonlinear Mapping , 2000, Pattern Analysis & Applications.

[20]  Gwo-Fong Lin,et al.  Time series forecasting by combining the radial basis function network and the self‐organizing map , 2005 .

[21]  Gunnar Lischeid,et al.  Trends in Deposition and Canopy Leaching of Mineral Elements as Indicated by Bulk Deposition and Throughfall Measurements , 2004 .

[22]  Holger R. Maier,et al.  Input determination for neural network models in water resources applications. Part 1—background and methodology , 2005 .

[23]  J. Cruz,et al.  Major ion chemistry of groundwater from perched-water bodies of the Azores (Portugal) volcanic archipelago , 2004 .

[24]  N. Lauzon,et al.  Clustering of heterogeneous precipitation fields for the assessment and possible improvement of lumped neural network models for streamflow forecasts , 2006 .

[25]  C. Alewell,et al.  Apparent translatory flow in groundwater recharge and runoff generation , 2002 .

[26]  Anil K. Jain,et al.  Artificial neural networks for feature extraction and multivariate data projection , 1995, IEEE Trans. Neural Networks.

[27]  Gunnar Lischeid,et al.  Tracing biogeochemical processes in stream water and groundwater using non-linear statistics , 2008 .

[28]  Erwin Beck,et al.  Nonlinear dimensionality reduction: Alternative ordination approaches for extracting and visualizing biodiversity patterns in tropical montane forest vegetation data , 2007, Ecol. Informatics.

[29]  Miklas Scholz,et al.  Application of the self-organizing map (SOM) to assess the heavy metal removal performance in experimental constructed wetlands. , 2006, Water research.

[30]  Ashu Jain,et al.  Integrated approach to model decomposed flow hydrograph using artificial neural network and conceptual techniques , 2006 .

[31]  Juha Vesanto,et al.  SOM-based data visualization methods , 1999, Intell. Data Anal..

[32]  Walter Krämer,et al.  Review of Modern applied statistics with S, 4th ed. by W.N. Venables and B.D. Ripley. Springer-Verlag 2002 , 2003 .

[33]  T. Foken,et al.  The Lehstenbach and Steinkreuz catchments in NE Bavaria, Germany , 2004 .

[34]  William N. Venables,et al.  Modern Applied Statistics with S , 2010 .

[35]  Brian D. Ripley,et al.  Pattern Recognition and Neural Networks , 1996 .

[36]  Dennis R Helsel,et al.  Fabricating data: how substituting values for nondetects can ruin results, and what can be done about it. , 2006, Chemosphere.