Recognition and elimination of missing values and outliers from an anaerobic wastewater treatment system using K-Means cluster

A methodology for the exploratory analysis of a large data set from a wastewater treatment system has been obtained. With properly designed data collection systems, the quality of the data is pretentious with the presence of discontinuity or gaps in the data record and outliers, both of which create severe handicaps in modelling and identification of the process. The aim of this work is to analyse the data considering missing values and outliers (screen the data) taken from an anaerobic filter data using K-Means and Fuzzy C - Means clustering. Both the techniques examined the multidimensional datasets of a wastewater treatment system to recognize and remove the missing values and outliers. A comparison of K-means with Fuzzy C-means showed that K-means had a good performance in removing the missing values and outliers. The time taken to screen the data by recognizing and eliminating the discontinuity and outliers was found to be less for K-means.