Representing Conditional Independence Using Decision Trees

While the representation of decision trees is fully expressive theoretically, it has been observed that traditional decision trees has the replication problem. This problem makes decision trees to be large and learnable only when sufficient training data are available. In this paper, we present a new representation model, conditional independence trees (CITrees), to tackle the replication problem from probability perspective. We propose a novel algorithm for learning CITrees. Our experiments show that CITrees outperform naive Bayes (Langley, Iba, & Thomas 1992), C4.5 (Quinlan 1993), TAN (Friedman, Geiger, & Goldszmidt 1997), and AODE (Webb, Boughton, & Wang 2005) significantly in classification accuracy.

[1]  Nir Friedman,et al.  Bayesian Network Classifiers , 1997, Machine Learning.

[2]  Ron Kohavi,et al.  The Case against Accuracy Estimation for Comparing Induction Algorithms , 1998, ICML.

[3]  Craig Boutilier,et al.  Context-Specific Independence in Bayesian Networks , 1996, UAI.

[4]  Andrew McCallum,et al.  Dynamic conditional random fields: factorized probabilistic models for labeling and segmenting sequence data , 2004, J. Mach. Learn. Res..

[5]  Harry Zhang,et al.  Conditional Independence Trees , 2004, ECML.

[6]  Jonathan J. Oliver Decision Graphs - An Extension of Decision Trees , 1993 .

[7]  Manfred Jaeger,et al.  Probabilistic Decision Graphs - Combining Verification And Ai Techniques For Probabilistic Inference , 2002, Int. J. Uncertain. Fuzziness Knowl. Based Syst..

[8]  Pat Langley,et al.  An Analysis of Bayesian Classifiers , 1992, AAAI.

[9]  Ron Kohavi,et al.  Bottom-Up Induction of Oblivious Read-Once Decision Graphs: Strengths and Limitations , 1994, AAAI.

[10]  D. Haussler,et al.  Boolean Feature Discovery in Empirical Learning , 1990, Machine Learning.

[11]  Judea Pearl,et al.  Probabilistic reasoning in intelligent systems - networks of plausible inference , 1991, Morgan Kaufmann series in representation and reasoning.

[12]  Nir Friedman,et al.  Learning Bayesian Networks with Local Structure , 1996, UAI.

[13]  Geoffrey I. Webb,et al.  Not So Naive Bayes: Aggregating One-Dependence Estimators , 2005, Machine Learning.

[14]  Ian H. Witten,et al.  Data mining: practical machine learning tools and techniques, 3rd Edition , 1999 .

[15]  Alexander G. Gray,et al.  Retrofitting Decision Tree Classifiers Using Kernel Density Estimation , 1995, ICML.

[16]  J. Ross Quinlan,et al.  C4.5: Programs for Machine Learning , 1992 .

[17]  Ian H. Witten,et al.  Data mining: practical machine learning tools and techniques with Java implementations , 2002, SGMD.

[18]  Wray L. Buntine Theory Refinement on Bayesian Networks , 1991, UAI.

[19]  Pedro M. Domingos,et al.  Tree Induction for Probability-Based Ranking , 2003, Machine Learning.

[20]  Pavel Brazdil,et al.  Proceedings of the European Conference on Machine Learning , 1993 .