In this paper we use the property of diatoms as bioindicators, to indentify which WQC is the taken sample using machine learning classifying algorithm – CN2. Important physical-chemical parameters such as conductivity, saturated oxygen and mostly used pH have defined range in which water class they belong. These physical-chemical parameters have influence on the entire lake web food chain, thus disturbing the organism’s patterns and interactions between them, such as diatoms community. These communities have where high coefficient of indication on certain process such as eutrophication, which means that they can be used as bio- indicators of water quality. The CN2 algorithm can produce rules in a form IF-THEN which is suitable for organizing knowledge from diatoms abundance data. In literature the diatoms have ecological preference organized in the same manner. The experimental setup is build to satisfy not only the algorithm properties, but also the ecological knowledge of the diatoms community. We used several modifications of the algorithm, from which we compare the classification accuracy or rule quality to point which experiment proved to be most accuracy and more general. Several of the rules are presented in this paper together with the evaluation performance. Based on modifications of the CN2 algorithm parameters, we were able to extract certain knowledge form the data, which later have proved to be valid, or in some cases is novel for many newly discovered diatoms. In future we plan to investigate more modifications of the CN2 algorithm, also to implement multi-classification rule induction and compare these results to the single target.
[1]
Ryszard S. Michalski,et al.
On the Quasi-Minimal Solution of the General Covering Problem
,
1969
.
[2]
Jerzy W. Grzymala-Busse,et al.
Rule Induction
,
2005,
Data Mining and Knowledge Discovery Handbook.
[3]
Tim Niblett,et al.
Constructing Decision Trees in Noisy Domains
,
1987,
EWSL.
[4]
Thomas G. Dietterich.
What is machine learning?
,
2020,
Archives of Disease in Childhood.
[5]
Peter Clark,et al.
The CN2 Induction Algorithm
,
1989,
Machine Learning.
[6]
Michel Coste,et al.
Field transfer of periphytic diatom communities to assess short-term structural effects of metals (Cd, Zn) in rivers.
,
2002,
Water research.
[7]
Peter Clark,et al.
Rule Induction with CN2: Some Recent Improvements
,
1991,
EWSL.
[8]
Sašo Džeroski,et al.
Using the m -estimate in rule induction
,
1993
.
[9]
Nada Lavrac,et al.
The Multi-Purpose Incremental Learning System AQ15 and Its Testing Application to Three Medical Domains
,
1986,
AAAI.
[10]
Philippe Quevauviller,et al.
The Water Framework Directive
,
2008
.
[11]
F. E. Round.
Bacillariophyceae, 2. teil: Bacillariaceae, Epithemiaceae, Surirellaceae
,
1990
.
[12]
Bojan Cestnik,et al.
Estimating Probabilities: A Crucial Task in Machine Learning
,
1990,
ECAI.
[13]
Ingemar Renberg,et al.
Diatoms as indicators of surface-water acidity.
,
1999
.
[14]
R. Carlson.
A trophic state index for lakes1
,
1977
.
[15]
P. Gell,et al.
The use of diatoms to assess past and present water quality
,
1995
.