Continuous-Valued Attributes in Fuzzy Decision Trees

Classical crisp decision trees (DT) are widely applied to classiication tasks. Nevertheless, there is still a lot of problems especially when dealing with numerical (continuous-valued) attributes. Some of those problems can be solved using fuzzy decision trees (FDT). This paper proposes a method for handling continuous-valued attributes with automatically generated (contrary to user deened) membership functions. An example with the decision tree construction and the unseen data classiication will be given. The results of crisp and fuzzy decision trees are compared at the end. 1 Continuous-Valued Attributes in TDIDT The aim of this paper is to combine decision trees (DT) with fuzziness, i.e. to use fuzziness to solve problems in the eld of DT. Most real-world applications of classiication learning involve continuous-valued attributes. However, classical Top Down Induction of Decision Tree (TDIDT) 6] algorithms only use nominal attributes. (The CART algorithm 1] and later versions of ID3 are also dealing with continuous-valued attributes .) Therefore continuous-valued attributes have to be discretized before they are selected, typically by partitioning the range of the attribute into subranges. In principle, a discretization is simply a logical condition (using one or more attributes) that serves to partition the data into at least two subsets. The main question is where to set the so-called cut points to partition the continuous-valued attributes. Fayyad and Irani 3] suggested a method for selecting the cut point as the midpoint between each successive pair of attribute values which has later been improved by the same authors and by Seidelmann 7]. The base for the algorithm are logical conditions like "A(x) < T", which means: a threshold T is determined and the test A(x) < T is assigned to the left branch while A(x) T is assigned to the right branch. Further investigations tried to reduce the number of cut points (boundary points) that always separate two classes 4]. By using entropy (as known from the construction of decision trees 6]) these boundaries decide whether such a point separates the attribute values having maximum information gain (cut point). While constructing the decision tree, algorithms do not only have to nd the testing attribute for each node, but also the "best" separation of this attribute. Moreover , Fayyad and Irani 4] generalize the method and show that the separation is not necessarily binary. In our opinion, there is a weakness in the method sketched above: the ""nal" position of the cut …