Decision Forests with Oblique Decision Trees

Ensemble learning schemes have shown impressive increases in prediction accuracy over single model schemes. We introduce a new decision forest learning scheme, whose base learners are Minimum Message Length (MML) oblique decision trees. Unlike other tree inference algorithms, MML oblique decision tree learning does not over-grow the inferred trees. The resultant trees thus tend to be shallow and do not require pruning. MML decision trees are known to be resistant to over-fitting and excellent at probabilistic predictions. A novel weighted averaging scheme is also proposed which takes advantage of high probabilistic prediction accuracy produced by MML oblique decision trees. The experimental results show that the new weighted averaging offers solid improvement over other averaging schemes, such as majority vote. Our MML decision forests scheme also returns favourable results compared to other ensemble learning algorithms on data sets with binary classes.

[1]  Jorma Rissanen,et al.  MDL-Based Decision Tree Pruning , 1995, KDD.

[2]  Raymond J. Mooney,et al.  Creating diversity in ensembles using artificial data , 2005, Inf. Fusion.

[3]  Janez Demsar,et al.  Statistical Comparisons of Classifiers over Multiple Data Sets , 2006, J. Mach. Learn. Res..

[4]  David L. Dowe,et al.  MML Inference of Decision Graphs with Multi-way Joins and Dynamic Attributes , 2003, Australian Conference on Artificial Intelligence.

[5]  Leo Breiman,et al.  Randomizing Outputs to Increase Prediction Accuracy , 2000, Machine Learning.

[6]  David L. Dowe,et al.  MML Inference of Decision Graphs with Multi-way Joins and Dynamic Attributes , 2002, Australian Conference on Artificial Intelligence.

[7]  David L. Dowe,et al.  Message Length as an Effective Ockham's Razor in Decision Tree Induction , 2001, International Conference on Artificial Intelligence and Statistics.

[8]  Catherine Blake,et al.  UCI Repository of machine learning databases , 1998 .

[9]  C. S. Wallace,et al.  Statistical and Inductive Inference by Minimum Message Length (Information Science and Statistics) , 2005 .

[10]  David L. Dowe,et al.  Minimum Message Length and Kolmogorov Complexity , 1999, Comput. J..

[11]  C. S. Wallace,et al.  Estimation and Inference by Compact Coding , 1987 .

[12]  Aiko M. Hormann,et al.  Programs for Machine Learning. Part I , 1962, Inf. Control..

[13]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[14]  L. Breiman Arcing Classifiers , 1998 .

[15]  Leo Breiman,et al.  Bagging Predictors , 1996, Machine Learning.

[16]  Ian Witten,et al.  Data Mining , 2000 .

[17]  G DietterichThomas An Experimental Comparison of Three Methods for Constructing Ensembles of Decision Trees , 2000 .

[18]  J. Ross Quinlan,et al.  C4.5: Programs for Machine Learning , 1992 .

[19]  Peter Grünwald,et al.  Invited review of the book Statistical and Inductive Inference by Minimum Message Length , 2006 .

[20]  Tin Kam Ho,et al.  The Random Subspace Method for Constructing Decision Forests , 1998, IEEE Trans. Pattern Anal. Mach. Intell..

[21]  Leo Breiman,et al.  Classification and Regression Trees , 1984 .

[22]  David J. Hand,et al.  On Pruning and Averaging Decision Trees , 1995, ICML.

[23]  Thomas G. Dietterich Machine-Learning Research Four Current Directions , 1997 .

[24]  Thomas G. Dietterich An Experimental Comparison of Three Methods for Constructing Ensembles of Decision Trees: Bagging, Boosting, and Randomization , 2000, Machine Learning.

[25]  David L. Dowe,et al.  MML Inference of Oblique Decision Trees , 2004, Australian Conference on Artificial Intelligence.

[26]  L. Breiman Arcing classifier (with discussion and a rejoinder by the author) , 1998 .

[27]  David L. Dowe,et al.  Minimum message length and generalized Bayesian nets with asymmetric languages , 2005 .

[28]  C. S. Wallace,et al.  An Information Measure for Classification , 1968, Comput. J..

[29]  David L. Dowe,et al.  General Bayesian networks and asymmetric languages , 2003 .

[30]  J. Rissanen,et al.  Modeling By Shortest Data Description* , 1978, Autom..

[31]  Yoav Freund,et al.  Experiments with a New Boosting Algorithm , 1996, ICML.

[32]  Peter A. Flach,et al.  Delegating classifiers , 2004, ICML.

[33]  David L. Dowe,et al.  Bayes not Bust! Why Simplicity is no Problem for Bayesians1 , 2007, The British Journal for the Philosophy of Science.