A simple regression based heuristic for learning model trees

The term "model trees" is commonly used for regression trees that contain some non-trivial model in their leaves. Popular implementations of model tree learners build trees with linear regression models in their leaves. They use reduction of variance as a heuristic for selecting tests during the tree construction process. In this article, we show that systems employing this heuristic may exhibit pathological behaviour in some quite simple cases. This is not visible in the predictive accuracy of the tree, but it reduces its explanatory power. We propose an alternative heuristic that yields equally accurate but simpler trees with better explanatory power, and this at little or no additional computational cost. The resulting model tree induction algorithm is experimentally evaluated and compared with simpler and more complex approaches on a variety of synthetic and real world data sets.

[1]  P. Chaudhuri,et al.  Piecewise polynomial regression trees , 1994 .

[2]  Catherine Blake,et al.  UCI Repository of machine learning databases , 1998 .

[3]  Steven Salzberg,et al.  Lookahead and Pathology in Decision Tree Induction , 1995, IJCAI.

[4]  Thomas G. Dietterich Approximate Statistical Tests for Comparing Supervised Classification Learning Algorithms , 1998, Neural Computation.

[5]  J. Ross Quinlan,et al.  C4.5: Programs for Machine Learning , 1992 .

[6]  Michelangelo Ceci,et al.  Top-down induction of model trees with regression and splitting nodes , 2004, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[7]  Ian H. Witten,et al.  Data mining: practical machine learning tools and techniques with Java implementations , 2002, SGMD.

[8]  Carla E. Brodley,et al.  Multivariate decision trees , 2004, Machine Learning.

[9]  J. R. Quinlan Learning With Continuous Classes , 1992 .

[10]  Malik Beshir Malik,et al.  Applied Linear Regression , 2005, Technometrics.

[11]  Johannes Gehrke,et al.  SECRET: a scalable linear regression tree algorithm , 2002, KDD.

[12]  Luís Torgo,et al.  Functional Models for Regression Tree Leaves , 1997, ICML.

[13]  Luís Torgo Computationally Efficient Linear Regression Trees , 2002 .

[14]  W. Loh,et al.  REGRESSION TREES WITH UNBIASED VARIABLE SELECTION AND INTERACTION DETECTION , 2002 .

[15]  Jerome H. Friedman Multivariate adaptive regression splines (with discussion) , 1991 .

[16]  Ian Witten,et al.  Data Mining , 2000 .

[17]  Aram Karalic,et al.  Employing Linear Regression in Regression Tree Leaves , 1992, ECAI.