Managing Multiple Models

Recent research in model selection and adaptive modeling has produced an embarrassment of riches. By using any one of several different techniques, an analyst is able to generate a number of models that describe the same data set well. Examples include multiple tree models generated by bootstrapping or stochastic searches, and different subsets of variables in linear regression models identified by stochastic or exhaustive searches. While model averaging can use these models to improve prediction accuracy, interpretation of the resultant models becomes difficult. We seek a compromise, developing measures of dissimilarity between different models and using these to select good models which may reveal different aspects of the data. Data on housing prices in Boston are used to illustrate this in the context of treed regression models.

[1]  Robert Tibshirani,et al.  Model Search and Inference By Bootstrap "bumping , 1995 .

[2]  D. Rubinfeld,et al.  Hedonic housing prices and the demand for clean air , 1978 .

[3]  Edward I. George,et al.  Extracting Representative Tree Models From a Forest , 1998 .

[4]  R. Tibshirani,et al.  Model Search by Bootstrap “Bumping” , 1999 .

[5]  J. R. Koehler,et al.  Modern Applied Statistics with S-Plus. , 1996 .

[6]  Adrian F. M. Smith,et al.  A Bayesian CART algorithm , 1998 .

[7]  Leo Breiman,et al.  Bagging Predictors , 1996, Machine Learning.

[8]  Bart Kuijpers,et al.  Simulated annealing in the construction of near-optimal decision trees , 1994 .

[9]  Robert W. Wilson,et al.  Regressions by Leaps and Bounds , 2000, Technometrics.

[10]  W. Shannon,et al.  Combining classification trees using MLE. , 1999, Statistics in medicine.

[11]  Yoshua Bengio,et al.  Pattern Recognition and Neural Networks , 1995 .

[12]  E. George,et al.  Journal of the American Statistical Association is currently published by American Statistical Association. , 2007 .

[13]  Yoav Freund,et al.  Experiments with a New Boosting Algorithm , 1996, ICML.

[14]  L. Breiman Random Forests--random Features , 1999 .

[15]  John W. Sammon,et al.  A Nonlinear Mapping for Data Structure Analysis , 1969, IEEE Transactions on Computers.

[16]  Brian D. Ripley,et al.  Modern Applied Statistics with S Fourth edition , 2002 .

[17]  Jerome H. Friedman Multivariate adaptive regression splines (with discussion) , 1991 .

[18]  J. Freidman,et al.  Multivariate adaptive regression splines , 1991 .

[19]  H. Chipman,et al.  Bayesian CART Model Search , 1998 .