MML Inference of Decision Graphs with Multi-way Joins and Dynamic Attributes

A decision tree is a comprehensible representation that has been widely used in many supervised machine learning domains. But decision trees have two notable problems – those of replication and fragmentation. One way of solving these problems is to introduce the notion of decision graphs – a generalization of the decision tree – which addresses the above problems by allowing for disjunctions, or joins. While various decision graph systems are available, all of these systems impose some forms of restriction on the proposed representations, often leading to either a new redundancy or the original redundancy not being removed. Tan and Dowe (2002) introduced an unrestricted representation called the decision graph with multi-way joins, which has improved representative power and is able to use training data with improved efficiency. In this paper, we resolve the problem of encoding internal repeated structures by introducing dynamic attributes in decision graphs. A refined search heuristic to infer these decision graphs with dynamic attributes using the Minimum Message Length (MML) principle (see Wallace and Boulton (1968), Wallace and Freeman (1987) and Wallace and Dowe (1999)) is also introduced. On both real-world and artificial data, and in terms of both “right”/“wrong” classification accuracy and logarithm of probability “bit-costing” predictive accuracy (for binary and multinomial target attributes), our enhanced multi-way join decision graph program with dynamic attributes improves our Tan and Dowe (2002) multi-way join decision graph program, which in turn significantly out-performs both C4.5 and C5.0. The resultant graphs from the new decision graph scheme are also more concise than both those from C4.5 and from C5.0. We also comment on logarithm of probability as a means of scoring (probabilistic) predictions.

[1]  J. Ross Quinlan,et al.  C4.5: Programs for Machine Learning , 1992 .

[2]  Leon Sterling,et al.  AI '92 : proceedings of the 5th Australian Joint Conference on Artificial Intelligence : Hobart, Tasmania, 16-18 November 1992 , 1992 .

[3]  C. S. Wallace,et al.  Estimation and Inference by Compact Coding , 1987 .

[4]  Yishay Mansour,et al.  Boosting Using Branching Programs , 2000, J. Comput. Syst. Sci..

[5]  Catherine Blake,et al.  UCI Repository of machine learning databases , 1998 .

[6]  Jorma Rissanen,et al.  MDL-Based Decision Tree Pruning , 1995, KDD.

[7]  David L. Dowe,et al.  MML Inference of Decision Graphs with Multi-way Joins and Dynamic Attributes , 2002, Australian Conference on Artificial Intelligence.

[8]  Riichiro Mizoguchi,et al.  PRICAI 2000 Topics in Artificial Intelligence , 2000, Lecture Notes in Computer Science.

[9]  Jiawei Han,et al.  CPAR: Classification based on Predictive Association Rules , 2003, SDM.

[10]  C. S. Wallace,et al.  Coding Decision Trees , 1993, Machine Learning.

[11]  David L. Dowe,et al.  Minimum Message Length and Kolmogorov Complexity , 1999, Comput. J..

[12]  Arlindo L. Oliveira,et al.  Using the Minimum Description Length Principle to Infer Reduced Ordered Decision Graphs , 1996, Machine Learning.

[13]  Manuela M. Veloso,et al.  The Lumberjack Algorithm for Learning Linked Decision Forests , 2000, PRICAI.

[14]  C. S. Wallace,et al.  An Information Measure for Classification , 1968, Comput. J..

[15]  Ron Kohavi,et al.  Bottom-Up Induction of Oblivious Read-Once Decision Graphs: Strengths and Limitations , 1994, AAAI.

[16]  Bob McKay,et al.  AI 2002: Advances in Artificial Intelligence , 2002, Lecture Notes in Computer Science.

[17]  Pedro M. Domingos,et al.  Tree Induction for Probability-Based Ranking , 2003, Machine Learning.

[18]  David L. Dowe,et al.  Message Length as an Effective Ockham's Razor in Decision Tree Induction , 2001, International Conference on Artificial Intelligence and Statistics.

[19]  I. Good Corroboration, Explanation, Evolving Probability, Simplicity and a Sharpened Razor , 1968, The British Journal for the Philosophy of Science.

[20]  C. S. Wallace,et al.  MML mixture modelling of multi-state, Poisson, von Mises circular and Gaussian distributions , 1997 .

[21]  J. Rissanen,et al.  Modeling By Shortest Data Description* , 1978, Autom..

[22]  David L. Dowe,et al.  MML clustering of multi-state, Poisson, von Mises circular and Gaussian distributions , 2000, Stat. Comput..

[23]  Jeffrey S. Simonoff,et al.  Tree Induction Vs Logistic Regression: A Learning Curve Analysis , 2001, J. Mach. Learn. Res..

[24]  Ronald L. Rivest,et al.  Inferring Decision Trees Using the Minimum Description Length Principle , 1989, Inf. Comput..