Functional Models for Regression Tree Leaves

This paper presents a study about functional models for regression tree leaves. We evaluate experimentally several alternatives to the averages commonly used in regression trees. We have implemented a regression tree learner (HTL) that is able to use several alternative models in the tree leaves. We study the effect on accuracy and the computational cost of these alternatives. The experiments carried out on 11 data sets revealed that it is possible to significantly outperform the “naive” averages of regression trees. Among the four alternative models that we evaluated, kernel regressors were usually the best in terms of accuracy. Our study also indicates that by integrating regression trees with other regression approaches we are able to overcome the limitations of individual methods both in terms of accuracy as well as in computational efficiency.

[1]  Se June Hong,et al.  Use of Contextaul Information for Feature Ranking and Discretization , 1997, IEEE Trans. Knowl. Data Eng..

[2]  Richard F. Gunst,et al.  Applied Regression Analysis , 1999, Technometrics.

[3]  Catherine Blake,et al.  UCI Repository of machine learning databases , 1998 .

[4]  Igor Kononenko,et al.  Estimating Attributes: Analysis and Extensions of RELIEF , 1994, ECML.

[5]  Aram Karalic,et al.  Employing Linear Regression in Regression Tree Leaves , 1992, ECAI.

[6]  G. S. Watson,et al.  Smooth regression analysis , 1964 .

[7]  David H. Wolpert,et al.  Stacked generalization , 1992, Neural Networks.

[8]  Patrick Brézillon,et al.  Lecture Notes in Artificial Intelligence , 1999 .

[9]  Alexander G. Gray,et al.  Retrofitting Decision Tree Classifiers Using Kernel Density Estimation , 1995, ICML.

[10]  Marko Robnik-Šikonja,et al.  Context-Sensitive Attribute Estimation in Regression , 1996 .

[11]  Yoav Freund,et al.  A decision-theoretic generalization of on-line learning and an application to boosting , 1995, EuroCOLT.

[12]  João Gama,et al.  Characterizing the Applicability of Classification Algorithms Using Meta-Level Learning , 1994, ECML.

[13]  J. L. Hodges,et al.  Discriminatory Analysis - Nonparametric Discrimination: Consistency Properties , 1989 .

[14]  David W. Aha,et al.  Generalizing from Case studies: A Case Study , 1992, ML.

[15]  Leo Breiman,et al.  Bagging Predictors , 1996, Machine Learning.

[16]  Sholom M. Weiss,et al.  Rule-based Machine Learning Methods for Functional Prediction , 1995, J. Artif. Intell. Res..

[17]  Pedro M. Domingos Unifying Instance-Based and Rule-Based Induction , 1996, Machine Learning.

[18]  Luís Torgo,et al.  Search-Based Class Discretization , 1997, ECML.

[19]  Larry A. Rendell,et al.  The Feature Selection Problem: Traditional Methods and a New Algorithm , 1992, AAAI.

[20]  Luís Torgo,et al.  Knowledge Acquisition via Knowledge Integration , 1990 .

[21]  W. Cleveland,et al.  Smoothing by Local Regression: Principles and Methods , 1996 .

[22]  W. Cleveland Robust Locally Weighted Regression and Smoothing Scatterplots , 1979 .

[23]  J. Ross Quinlan,et al.  Combining Instance-Based and Model-Based Learning , 1993, ICML.

[24]  E. Nadaraya On Estimating Regression , 1964 .

[25]  Thomas G. Dietterich,et al.  A study of distance-based machine learning algorithms , 1994 .

[26]  Sholom M. Weiss,et al.  Using Case Data to Improve on Rule-based Function Approximation , 1995, ICCBR.

[27]  Sholom M. Weiss,et al.  Rule-Based Regression , 1993, IJCAI.

[28]  C. J. Stone,et al.  Consistent Nonparametric Regression , 1977 .

[29]  Jon Louis Bentley,et al.  Multidimensional binary search trees used for associative searching , 1975, CACM.

[30]  William H. Press,et al.  Numerical Recipes in C, 2nd Edition , 1992 .

[31]  David W. Aha,et al.  A study of instance-based algorithms for supervised learning tasks: mathematical, empirical, and psychological evaluations , 1990 .

[32]  David J. Spiegelhalter,et al.  Machine Learning, Neural and Statistical Classification , 2009 .