Automatic water mixing event identification in the Koljö fjord observatory data

This study addresses the task of automatically identifying water mixing events in the multivariate time series of salinity, temperature and dissolved oxygen provided by the Koljö fjord observatory. The observatory is used to test new underwater sensory technology and to monitor water quality with respect to hypoxia and oxygenation in the fjord and has been collecting data since April 2011. The fjord water properties change, manifesting as peaks or drops of dissolved oxygen, salinity and temperature, when affected by inflows of new water originating from the open sea or by rivers connected to the fjord system. An acute state of oxygen depletion can harm wildlife and the ecosystem permanently. The major challenge for the analysis is that the water property changes are marked by highly varying peak strength and correlation between the signals. The proposed data-driven analysis method extends existing univariate outlier detection approaches, based on clustering techniques, to identify the water mixing events. It incorporates three major steps: 1. smoothing of the input data, to counter noise, 2. individual outlier detection within the separate variables, 3. clustering of the results using the DBSCAN clustering algorithm to determine the anomalous events. The proposed approach is able to detect the water mixing events with a $$F{\textit{1}}$$F1-measure of 0.885, a precision of 0.931—that is 93.1% of all events have been correctly detected—and a recall of 0.843–84.3% of events that should have been found actually also have been. Using the proposed method, the oceanographers can be informed automatically about the status of the fjord without manual interaction or physical presence at the experiment site.

[1]  Maureen Meadows,et al.  Some properties of a simple moving average when applied to forecasting a time series , 1999, J. Oper. Res. Soc..

[2]  James D. Hamilton Time Series Analysis , 1994 .

[3]  Corinna Cortes,et al.  Support-Vector Networks , 1995, Machine Learning.

[4]  Victoria J. Hodge,et al.  A Survey of Outlier Detection Methodologies , 2004, Artificial Intelligence Review.

[5]  Jiawei Han,et al.  Data Mining: Concepts and Techniques , 2000 .

[6]  Jian Pei,et al.  Data Mining: Concepts and Techniques, 3rd edition , 2006 .

[7]  Gareth J. Janacek,et al.  Clustering time series from ARMA models with clipped data , 2004, KDD.

[8]  Eric Feron,et al.  Trajectory Clustering and an Application to Airspace Monitoring , 2011, IEEE Trans. Intell. Transp. Syst..

[9]  Wes McKinney,et al.  Python for Data Analysis , 2012 .

[10]  Gonzalo R. Arce,et al.  Theoretical analysis of the max/Median filter , 1987, IEEE Trans. Acoust. Speech Signal Process..

[11]  R. Suganya,et al.  Data Mining Concepts and Techniques , 2010 .

[12]  Dibo Hou,et al.  Water Quality Event Detection in Drinking Water Network , 2014, Water, Air, & Soil Pollution.

[13]  J. MacQueen Some methods for classification and analysis of multivariate observations , 1967 .

[14]  Hans-Peter Kriegel,et al.  A Density-Based Algorithm for Discovering Clusters in Large Spatial Databases with Noise , 1996, KDD.

[15]  Henrik Stahl,et al.  Detection of CO2 leakage from a simulated sub-seabed storage site using three different types of pCO2 sensors , 2015 .

[16]  Aapo Hyvärinen,et al.  Validating the independent components of neuroimaging time series via clustering and visualization , 2004, NeuroImage.

[17]  Avi Ostfeld,et al.  Event detection in water distribution systems from multivariate water quality time series. , 2012, Environmental science & technology.

[18]  Kjell Nordberg,et al.  Climate, hydrographic variations and marine benthic hypoxia in Koljö Fjord, Sweden , 2001 .

[19]  Stephen P. Boyd,et al.  Toeplitz Inverse Covariance-based Clustering of Multivariate Time Series Data , 2018, IJCAI.

[20]  George Bosilca,et al.  Open MPI: Goals, Concept, and Design of a Next Generation MPI Implementation , 2004, PVM/MPI.

[21]  Derya Birant,et al.  Spatio-temporal outlier detection in large databases , 2006, 28th International Conference on Information Technology Interfaces, 2006..

[22]  William E. Hart,et al.  Water Quality Event Detection. , 2008 .

[23]  Katherine A. Klise,et al.  Water quality change detection: multivariate algorithms , 2006, SPIE Defense + Commercial Sensing.

[24]  Morris Riedel,et al.  HPDBSCAN: highly parallel DBSCAN , 2015, MLHPC@SC.

[25]  Dimitris K. Tasoulis,et al.  Computational Intelligence Methods for Financial Time Series Modeling , 2006, Int. J. Bifurc. Chaos.

[26]  Jian Pei,et al.  DHC: a density-based hierarchical clustering method for time series gene expression data , 2003, Third IEEE Symposium on Bioinformatics and Bioengineering, 2003. Proceedings..

[27]  B. Liljebladh,et al.  Modelling the Orust fjord system on the Swedish west coast , 2013 .

[28]  T. Warren Liao,et al.  Clustering of time series data - a survey , 2005, Pattern Recognit..

[29]  D. Powers Evaluation: From Precision, Recall and F-Factor to ROC, Informedness, Markedness & Correlation , 2008 .

[30]  M. Diepenbroek,et al.  PANGAEA: an information system for environmental sciences , 2002 .

[31]  Lars Andersson,et al.  Trends in Nutrient and Oxygen Conditions Within the Kattegat: Effects of Local Nutrient Supply , 1988 .

[32]  L. K. Hansen,et al.  On Clustering fMRI Time Series , 1999, NeuroImage.

[33]  P. Whittle,et al.  Hypothesis-Testing in Time Series Analysis. , 1952 .

[34]  Jonathan D. Cryer,et al.  Time Series Analysis , 1986 .