Prequential analysis of complex data with adaptive model reselection

In Prequential analysis, an inference method is viewed as a forecasting system, and the quality of the inference method is based on the quality of its predictions. This is an alternative approach to more traditional statistical methods that focus on the inference of parameters of the data generating distribution. In this paper, we introduce adaptive combined average predictors (ACAPs) for the Prequential analysis of complex data. That is, we use convex combinations of two different model averages to form a predictor at each time step in a sequence. A novel feature of our strategy is that the models in each average are re-chosen adaptively at each time step. To assess the complexity of a given data set, we introduce measures of data complexity for continuous response data. We validate our measures in several simulated contexts prior to using them in real data examples. The performance of ACAPs is compared with the performances of predictors based on stacking or likelihood weighted averaging in several model classes and in both simulated and real data sets. Our results suggest that ACAPs achieve a better trade off between model list bias and model list variability in cases where the data is very complex. This implies that the choices of model class and averaging method should be guided by a concept of complexity matching, i.e. the analysis of a complex data set may require a more complex model class and averaging strategy than the analysis of a simpler data set. We propose that complexity matching is akin to a bias-variance tradeoff in statistical modeling.

[1]  H. Akaike,et al.  Information Theory and an Extension of the Maximum Likelihood Principle , 1973 .

[2]  A. Dawid,et al.  On efficient point prediction systems , 1998 .

[3]  Hans C. van Houwelingen,et al.  The Elements of Statistical Learning, Data Mining, Inference, and Prediction. Trevor Hastie, Robert Tibshirani and Jerome Friedman, Springer, New York, 2001. No. of pages: xvi+533. ISBN 0‐387‐95284‐5 , 2004 .

[4]  A. Juditsky,et al.  Functional aggregation for nonparametric regression , 2000 .

[5]  A. Dawid,et al.  Prequential probability: principles and properties , 1999 .

[6]  Claudio Conversano,et al.  Supervised Classifier Combination through Generalized Additive Multi-model , 2000, Multiple Classifier Systems.

[7]  David H. Wolpert,et al.  Using self-dissimilarity to quantify complexity , 2007, Complex..

[8]  J. Aitchison Goodness of prediction fit , 1975 .

[9]  Tin Kam Ho,et al.  Complexity Measures of Supervised Classification Problems , 2002, IEEE Trans. Pattern Anal. Mach. Intell..

[10]  Richard Baumgartner,et al.  Data complexity assessment in undersampled classification of high-dimensional biomedical data , 2006, Pattern Recognit. Lett..

[11]  Robert P. W. Duin,et al.  Object Representation, Sample Size, and Data Set Complexity , 2006 .

[12]  Yuhong Yang Adaptive Regression by Mixing , 2001 .

[13]  J. A. Hartigan,et al.  A k-means clustering algorithm , 1979 .

[14]  Vladimir Pestov,et al.  An axiomatic approach to intrinsic dimension of a dataset , 2007, Neural Networks.

[15]  Alistair Duffy,et al.  Analysis of techniques to compare complex data sets , 2002 .

[16]  C. Chatfield Model uncertainty, data mining and statistical inference , 1995 .

[17]  A. P. Dawid,et al.  On efficient probability forecasting systems , 1999 .

[18]  Andrew R. Barron,et al.  Minimum complexity density estimation , 1991, IEEE Trans. Inf. Theory.

[19]  R. Rigby,et al.  Generalized additive models for location, scale and shape , 2005 .

[20]  E. Masry,et al.  Prequential and cross-validated mixture regression estimation , 1998, Proceedings. 1998 IEEE International Symposium on Information Theory (Cat. No.98CH36252).

[21]  I-Cheng Yeh,et al.  Modeling of strength of high-performance concrete using artificial neural networks , 1998 .

[22]  D. Ruppert The Elements of Statistical Learning: Data Mining, Inference, and Prediction , 2004 .

[23]  J. S. Marron,et al.  A scale-based approach to finding effective dimensionality in manifold learning , 2007, 0710.5349.

[24]  D. Donoho,et al.  Hessian eigenmaps: Locally linear embedding techniques for high-dimensional data , 2003, Proceedings of the National Academy of Sciences of the United States of America.

[25]  David I. Warton,et al.  Penalized Normal Likelihood and Ridge Regularization of Correlation and Covariance Matrices , 2008 .

[26]  David Draper,et al.  Assessment and Propagation of Model Uncertainty , 2011 .

[27]  Burton H. Singer,et al.  Recursive partitioning in the health sciences , 1999 .

[28]  Alan Y. Chiang,et al.  Generalized Additive Models: An Introduction With R , 2007, Technometrics.

[29]  D. Haussler,et al.  MUTUAL INFORMATION, METRIC ENTROPY AND CUMULATIVE RELATIVE ENTROPY RISK , 1997 .

[30]  David H. Wolpert,et al.  Stacked generalization , 1992, Neural Networks.

[31]  A. Dawid,et al.  On Testing the Validity of Sequential Probability Forecasts , 1993 .

[32]  David Haussler,et al.  HOW WELL DO BAYES METHODS WORK FOR ON-LINE PREDICTION OF {+- 1} VALUES? , 1992 .

[33]  Prequential analysis of cattle prices , 1990 .

[34]  J. Gower A General Coefficient of Similarity and Some of Its Properties , 1971 .

[35]  Yuhong Yang Can the Strengths of AIC and BIC Be Shared , 2005 .

[36]  Evgenia Dimitriadou,et al.  The mlbench Package , 2006 .

[37]  Wei-Yin Loh,et al.  Classification and Regression Tree Methods , 2008 .

[38]  Dean Phillips Foster Prediction in the Worst Case , 1991 .

[39]  Paul E. Black,et al.  Dictionary of Algorithms and Data Structures | NIST , 1998 .

[40]  Yaser S. Abu-Mostafa,et al.  Data Complexity in Machine Learning , 2006 .

[41]  R. Shibata An optimal selection of regression variables , 1981 .

[42]  Wei-Yin Loh,et al.  Classification and regression trees , 2011, WIREs Data Mining Knowl. Discov..

[43]  Yuhong Yang,et al.  Assessing Forecast Accuracy Measures , 2004 .

[44]  Peter J. Bickel,et al.  Maximum Likelihood Estimation of Intrinsic Dimension , 2004, NIPS.

[45]  J. Shao AN ASYMPTOTIC THEORY FOR LINEAR MODEL SELECTION , 1997 .

[46]  Larry D. Haugh,et al.  Statistical case studies : a collaboration between academe and industry , 1999 .

[47]  Pedro M. Domingos Bayesian Averaging of Classifiers and the Overfitting Problem , 2000, ICML.

[48]  Andrew B. Nobel Analysis of a complexity-based pruning scheme for classification trees , 2002, IEEE Trans. Inf. Theory.

[49]  Shaoli Wang,et al.  On Directional Regression for Dimension Reduction , 2007 .

[50]  Bertrand Clarke,et al.  Improvement over bayes prediction in small samples in the presence of model uncertainty , 2004 .

[51]  C. R. Deboor,et al.  A practical guide to splines , 1978 .

[52]  Peter E. Latham,et al.  Mutual Information , 2006 .

[53]  Jörg D. Wichard,et al.  Building Ensembles with Heterogeneous Models , 2003 .

[54]  R. Berk,et al.  Limiting Behavior of Posterior Distributions when the Model is Incorrect , 1966 .

[55]  M. Clyde,et al.  Model Uncertainty , 2003 .

[56]  Jonnagadda S Rao,et al.  Bootstrap choice of cost complexity for better subset selection , 1999 .

[57]  P. Grassberger,et al.  Measuring the Strangeness of Strange Attractors , 1983 .

[58]  J. Rissanen,et al.  Modeling By Shortest Data Description* , 1978, Autom..

[59]  A. Dawid Calibration-Based Empirical Probability , 1985 .

[60]  E. George The Variable Selection Problem , 2000 .

[61]  Andrew R. Barron,et al.  Asymptotic minimax regret for data compression, gambling, and prediction , 1997, IEEE Trans. Inf. Theory.

[62]  L. Breiman Heuristics of instability and stabilization in model selection , 1996 .

[63]  W. Loh,et al.  Classification and Regression Tree Methods ( In Encyclopedia of Statistics in Quality and Reliability , 2008 .

[64]  Jörg D. Wichard,et al.  Model Selection in an Ensemble Framework , 2006, The 2006 IEEE International Joint Conference on Neural Network Proceedings.

[65]  C. S. Wallace,et al.  An Information Measure for Classification , 1968, Comput. J..