Learning acyclic decision trees with Functional Dependency Network and MDL Genetic Programming

One objective of data mining is to discover parent-child relationships among a set of variables in the domain. Moreover, showing parents' importance can further help to improve decision makings' quality. Bayesian network (BN) is a useful model for multi-class problems and can illustrate parent-child relationships with no cycle. But it cannot show parents' importance. In contrast, decision trees state parents' importance clearly, for instance, the most important parent is put in the first level. However, decision trees are proposed for single-class problems only, when they are applied to multi-class ones, they are likely to produce cycles representing tautologic. In this paper, we propose to use MDL genetic programming (MDLGP) and functional dependency network (FDN) to learn a set of acyclic decision trees (Shum et al., 2005). The FDN is an extension of BN; it can handle all of discrete, continuous, interval and ordinal values; it guarantees to produce decision trees with no cycle; its learning search space is smaller than decision trees'; and it can represent higher-order relationships among variables. The MDLGP is a robust genetic programming (GP) proposed to learn the FDN. We also propose a method to derive acyclic decision trees from the FDN. The experimental results demonstrate that the proposed method can successfully discover the target decision trees, which have no cycle and have the accurate classification results

[1]  João Gama,et al.  Linear tree , 1999, Intell. Data Anal..

[2]  Zbigniew Michalewicz,et al.  Genetic Algorithms + Data Structures = Evolution Programs , 1992, Artificial Intelligence.

[3]  J. Ross Quinlan,et al.  C4.5: Programs for Machine Learning , 1992 .

[4]  Catherine Blake,et al.  UCI Repository of machine learning databases , 1998 .

[5]  Kwong-Sak Leung,et al.  Learning functional dependency networks based on genetic programming , 2005, Fifth IEEE International Conference on Data Mining (ICDM'05).

[6]  J. R. Quinlan,et al.  Data Mining Tools See5 and C5.0 , 2004 .

[7]  Alberto Maria Segre,et al.  Programs for Machine Learning , 1994 .

[8]  Cezary Z. Janikow,et al.  Fuzzy decision forest , 2000, PeachFuzz 2000. 19th International Conference of the North American Fuzzy Information Processing Society - NAFIPS (Cat. No.00TH8500).

[9]  Atish P. Sinha,et al.  An efficient algorithm for generating generalized decision forests , 2005, IEEE Transactions on Systems, Man, and Cybernetics - Part A: Systems and Humans.

[10]  Kwong-Sak Leung,et al.  Learning non-overlapping rules A method based on Functional Dependency Network and MDL Genetic Programming , 2006, 2006 IEEE International Conference on Evolutionary Computation.

[11]  Gary B. Fogel,et al.  Emphasizing extinction in evolutionary programming , 1999, Proceedings of the 1999 Congress on Evolutionary Computation-CEC99 (Cat. No. 99TH8406).

[12]  Simon Kasif,et al.  A System for Induction of Oblique Decision Trees , 1994, J. Artif. Intell. Res..