Rule Inducted Models for Classifying Water Quality Using Diatoms as Bio-indicators

In this paper we use the property of diatoms as bioindicators, to indentify which WQC is the taken sample using machine learning classifying algorithm – CN2. Important physical-chemical parameters such as conductivity, saturated oxygen and mostly used pH have defined range in which water class they belong. These physical-chemical parameters have influence on the entire lake web food chain, thus disturbing the organism’s patterns and interactions between them, such as diatoms community. These communities have where high coefficient of indication on certain process such as eutrophication, which means that they can be used as bio- indicators of water quality. The CN2 algorithm can produce rules in a form IF-THEN which is suitable for organizing knowledge from diatoms abundance data. In literature the diatoms have ecological preference organized in the same manner. The experimental setup is build to satisfy not only the algorithm properties, but also the ecological knowledge of the diatoms community. We used several modifications of the algorithm, from which we compare the classification accuracy or rule quality to point which experiment proved to be most accuracy and more general. Several of the rules are presented in this paper together with the evaluation performance. Based on modifications of the CN2 algorithm parameters, we were able to extract certain knowledge form the data, which later have proved to be valid, or in some cases is novel for many newly discovered diatoms. In future we plan to investigate more modifications of the CN2 algorithm, also to implement multi-classification rule induction and compare these results to the single target.