Clustered Partial Linear Regression

This paper presents a new method that deals with a supervised learning task usually known as multiple regression. The main distinguishing feature of our technique is the use of a multistrategy approach to this learning task. We use a clustering method to form sub-sets of the training data before the actual regression modeling takes place. This pre-clustering stage creates several training sub-samples containing cases that are “nearby” to each other from the perspective of the multidimensional input space. Supervised learning within each of these sub-samples is easier and more accurate as our experiments show. We call the resulting method clustered partial linear regression. Predictions using these models are preceded by a cluster membership query for each test case. The cluster membership probability of a test case is used as a weight in an averaging process that calculates the final prediction. This averaging process involves the predictions of the regression models associated to the clusters for which the test case may belong. We have tested this general multistrategy approach using several regression techniques and we have observed significant accuracy gains in several data sets. We have also compared our method to bagging that also uses an averaging process to obtain predictions. This experiment showed that the two methods are significantly different. Finally, we present a comparison of our method with several state-of-the-art regression methods showing its competitiveness.

[1]  Clifford Henry Spiegelman Two techniques for estimating treatment effect in the presence of hidden variables : adaptive regression and a solution of Reiersøl's problem , 1991 .

[2]  E. Nadaraya On Estimating Regression , 1964 .

[3]  W. Härdle Applied Nonparametric Regression , 1991 .

[4]  Peter C. Cheeseman,et al.  Bayesian Classification (AutoClass): Theory and Results , 1996, Advances in Knowledge Discovery and Data Mining.

[5]  Brian Falkenhainer,et al.  6 – INTEGRATING QUANTITATIVE AND QUALITATIVE DISCOVERY IN THE ABACUS SYSTEM , 1990 .

[6]  J. Friedman Multivariate adaptive regression splines , 1990 .

[7]  J. Freidman,et al.  Multivariate adaptive regression splines , 1991 .

[8]  James Kelly,et al.  AutoClass: A Bayesian Classification System , 1993, ML.

[9]  Luís Torgo,et al.  Clustered Partial Linear Regression , 2000, ECML.

[10]  W. Cleveland,et al.  Smoothing by Local Regression: Principles and Methods , 1996 .

[11]  N. Draper,et al.  Applied Regression Analysis , 1966 .

[12]  Gianluca Bontempi,et al.  Local learning techniques for modeling, prediction and control , 2000 .

[13]  R. Schapire The Strength of Weak Learnability , 1990, Machine Learning.

[14]  Leo Breiman,et al.  Bagging Predictors , 1996, Machine Learning.

[15]  J. Ross Quinlan,et al.  C4.5: Programs for Machine Learning , 1992 .

[16]  Charles Elkan,et al.  Scalability for clustering algorithms revisited , 2000, SKDD.

[17]  Andrew W. Moore,et al.  Locally Weighted Learning , 1997, Artificial Intelligence Review.

[18]  Shinichi Morishita,et al.  On Classification and Regression , 1998, Discovery Science.

[19]  Luís Torgo,et al.  Partial Linear Trees , 2000, ICML.

[20]  T. Hastie,et al.  Local Regression: Automatic Kernel Carpentry , 1993 .

[21]  David W. Aha,et al.  Lazy Learning , 1997, Springer Netherlands.

[22]  Norman R. Draper,et al.  Applied regression analysis (2. ed.) , 1981, Wiley series in probability and mathematical statistics.

[23]  Paul S. Bradley,et al.  Scaling Clustering Algorithms to Large Databases , 1998, KDD.

[24]  L. Torgo,et al.  Inductive learning of tree-based regression models , 1999 .

[25]  Jerome H. Friedman Multivariate adaptive regression splines (with discussion) , 1991 .

[26]  G. S. Watson,et al.  Smooth regression analysis , 1964 .

[27]  Yoav Freund,et al.  Experiments with a New Boosting Algorithm , 1996, ICML.