Prediction in M-complete Problems with Limited Sample Size

Abstract. We define a new Bayesian predictor called the posterior weighted median (PWM) and compare its performance to several other predictors including the Bayes model average under squared error loss, the Barbieri-Berger median model predictor, the stacking predictor, and the model average predictor based on Akaike’s information criterion. We argue that PWM generally gives better performance than other predictors over a range of M-complete problems. This range is between theM-closed-M-complete boundary and theM-completeM-open boundary. Indeed, as a problem gets closer to M-open, it seems that M-complete predictive methods begin to break down. Our comparisons rest on extensive simulations and real data examples.

[1]  R. Shibata Asymptotic mean efficiency of a selection of regression variables , 1983 .

[2]  Andrew R. Barron,et al.  Information Theory and Mixing Least-Squares Regressions , 2006, IEEE Transactions on Information Theory.

[3]  M. Symonds,et al.  A brief guide to model selection, multimodel inference and model averaging in behavioural ecology using Akaike’s information criterion , 2010, Behavioral Ecology and Sociobiology.

[4]  Hee-Seok Oh,et al.  Bayesian regression based on principal components for high-dimensional data , 2013, J. Multivar. Anal..

[5]  Chi Wai Yu,et al.  Statistical Problem Classes and Their Links to Information Theory , 2014 .

[6]  Jennifer Clarke,et al.  Prequential analysis of complex data with adaptive model reselection , 2009, Stat. Anal. Data Min..

[7]  Bertrand Clarke,et al.  Improvement over bayes prediction in small samples in the presence of model uncertainty , 2004 .

[8]  David Mease,et al.  Evidence Contrary to the Statistical View of Boosting , 2008, J. Mach. Learn. Res..

[9]  Padhraic Smyth,et al.  Linearly Combining Density Estimators via Stacking , 1999, Machine Learning.

[10]  R. Kashyap Inconsistency of the AIC rule for estimating the order of autoregressive models , 1980 .

[11]  Gábor Lugosi,et al.  Prediction, learning, and games , 2006 .

[12]  Charles C. Taylor,et al.  On boosting kernel regression , 2008 .

[13]  Thomas Richardson,et al.  Boosting methodology for regression problems , 1999, AISTATS.

[14]  David H. Wolpert,et al.  On the Connection between In-sample Testing and Generalization Error , 1992, Complex Syst..

[15]  Jerald B. Johnson,et al.  Model selection in ecology and evolution. , 2004, Trends in ecology & evolution.

[16]  Bertrand Clarke,et al.  Comparing Bayes Model Averaging and Stacking When Model Approximation Error Cannot be Ignored , 2003, J. Mach. Learn. Res..

[17]  Pascal Massart A Non-asymptotic Theory for Model Selection , 2005 .

[18]  Purushottam W. Laud,et al.  Predictive Model Selection , 1995 .

[19]  Andrew R. Barron,et al.  Minimum complexity density estimation , 1991, IEEE Trans. Inf. Theory.

[20]  A. Dawid,et al.  On efficient point prediction systems , 1998 .

[21]  I. Johnstone,et al.  Ideal spatial adaptation by wavelet shrinkage , 1994 .