Classification of water for production using parameters in real time

In this paper, a new classification method for production water is proposed, based on so real-time measured parameters. The classification method consists of three steps: 1) An initial classification of the Water Quality Index is computed using the method proposed by KUMAR; 2) Feature selection based on random forest (specifically based on the method varSelRF); and 3) Training of classifiers using different configurations of heuristic decision trees. A total of 4 datasets (5090 instances of 8 features each) representative of water samples from Portugal, Canada, Mexico, and Romania were used for method validation. The dataset was group in two families of different classes: binary (good and regular water) and multiclass (good, regular and bad water). Final classification accuracy reached 94.85% for the binary family and 91.73% for the multiclass family. The contribution consists of a continuous monitoring system to detect (in real time) dramatic changes in water quality and provide tools for historical studies behaviour in strategic points.