A Principal Components Approach to Combining Regression Estimates

The goal of combining the predictions of multiple learned models is to form an improved estimator. A combining strategy must be able to robustly handle the inherent correlation, or multicollinearity, of the learned models while identifying the unique contributions of each. A progression of existing approaches and their limitations with respect to these two issues are discussed. A new approach, PCR*, based on principal components regression is proposed to address these limitations. An evaluation of the new approach on a collection of domains reveals that (1) PCR* was the most robust combining method, (2) correlation could be handled without eliminating any of the learned models, and (3) the principal components of the learned models provided a continuum of “regularized” weights from which PCR* could choose.

[1]  Catherine Blake,et al.  UCI Repository of machine learning databases , 1998 .

[2]  Manfred K. Warmuth,et al.  Exponentiated Gradient Versus Gradient Descent for Linear Predictors , 1997, Inf. Comput..

[3]  Douglas C. Montgomery,et al.  PREDICTION USING REGRESSION MODELS WITH MULTICOLLINEAR PREDICTOR VARIABLES , 1993 .

[4]  David H. Wolpert,et al.  Stacked generalization , 1992, Neural Networks.

[5]  Elie Bienenstock,et al.  Neural Networks and the Bias/Variance Dilemma , 1992, Neural Computation.

[6]  Michael J. Pazzani,et al.  Classification and regression by combining models , 1998 .

[7]  Charles L. Lawson,et al.  Solving least squares problems , 1976, Classics in applied mathematics.

[8]  Geoffrey E. Hinton,et al.  Learning internal representations by error propagation , 1986 .

[9]  Robert E. Schapire,et al.  The strength of weak learnability , 1990, Mach. Learn..

[10]  Yoav Freund,et al.  Experiments with a New Boosting Algorithm , 1996, ICML.

[11]  Han van de Waterbeemd,et al.  The QSAR and Modelling Society , 1996 .

[12]  J. Friedman Multivariate adaptive regression splines , 1990 .

[13]  J. Ross Quinlan,et al.  Bagging, Boosting, and C4.5 , 1996, AAAI/IAAI, Vol. 1.

[14]  N. Draper,et al.  Applied Regression Analysis , 1966 .

[15]  Volker Tresp,et al.  Combining Estimators Using Non-Constant Weighting Functions , 1994, NIPS.

[16]  Thomas G. Dietterich,et al.  Error-Correcting Output Coding Corrects Bias and Variance , 1995, ICML.

[17]  Leo Breiman,et al.  Stacked regressions , 2004, Machine Learning.

[18]  Bruce W. Schmeiser,et al.  Improving model accuracy using optimal linear combinations of trained neural networks , 1995, IEEE Trans. Neural Networks.

[19]  Anders Krogh,et al.  Neural Network Ensembles, Cross Validation, and Active Learning , 1994, NIPS.

[20]  Geoffrey E. Hinton,et al.  Adaptive Mixtures of Local Experts , 1991, Neural Computation.

[21]  David W. Opitz,et al.  Generating Accurate and Diverse Members of a Neural-Network Ensemble , 1995, NIPS.

[22]  John E. Moody,et al.  Fast Pruning Using Principal Components , 1993, NIPS.

[23]  Robert A. Jacobs,et al.  Hierarchical Mixtures of Experts and the EM Algorithm , 1993, Neural Computation.

[24]  H. Sebastian Seung,et al.  Query by committee , 1992, COLT '92.

[25]  L. Cooper,et al.  When Networks Disagree: Ensemble Methods for Hybrid Neural Networks , 1992 .

[26]  C. Lawson,et al.  Solving least squares problems , 1976, Classics in applied mathematics.

[27]  R. Tibshirani,et al.  Combining Estimates in Regression and Classification , 1996 .

[28]  Yoav Freund,et al.  A decision-theoretic generalization of on-line learning and an application to boosting , 1995, EuroCOLT.

[29]  William H. Press,et al.  The Art of Scientific Computing Second Edition , 1998 .

[30]  H. Sebastian Seung,et al.  Information, Prediction, and Query by Committee , 1992, NIPS.

[31]  Jerome H. Friedman Multivariate adaptive regression splines (with discussion) , 1991 .

[32]  Ron Meir,et al.  Bias, Variance and the Combination of Least Squares Estimators , 1994, NIPS.