论文信息 - Comparing Model Comparison Methods

Comparing Model Comparison Methods

Comparing Model Comparison Methods Holger Schultheis (schulth@informatik.uni-bremen.de) Cognitive Systems, University of Bremen, Enrique-Schmidt-Str. 5, 28359 Bremen, Germany Ankit Singhaniya Computer Science and Engineering, NIT Nagpur, Nagpur 440010, India Devendra Singh Chaplot Computer Science and Engineering, IIT Bombay, Mumbai 400076, India Abstract Comparison of the ability of different computational cogni- tive models to simulate empirical data should ideally take into account the complexity of the compared models. Although several comparison methods are available that are meant to achieve this, little information on the differential strengths and weaknesses of these methods is available. In this contribu- tion we present the results of a systematic comparison of 5 model comparison methods. Employing model recovery sim- ulations, the methods are examined with respect to their ability to identify the model that actually generated the data across 3 pairs of models and a number of comparison situations. The simulations reveal several interesting aspects of the considered methods such as, for instance, the fact that in certain situa- tions methods perform worse than model comparison neglect- ing model complexity. Based on the identified method charac- teristics, we derive a preliminary recommendation on when to use which of the 5 methods. Keywords: computational cognitive models, model compari- son, model mimicry, model generalization When computationally modeling cognition, often several dif- ferent models are available or conceivable as explanations for the cognitive ability in question. In such a situation, the aim is to select the best of these candidate models according to a set of criteria. Among others (e.g., falsifiability or inter- pretability) the extent to which the different models are able to simulate observed human behavior is usually considered a key criterion for selecting from the candidate models. A na¨ive approach to gauge the models’ ability to simulate the existing observations is to fit each model to the available data and choose the model that provides the tightest fit as indi- cated, for instance, by the models’ Root Mean Squared Error (RMSE). Such an approach is problematic, because it does not take into account the complexity of the compared mod- els. As a result, there is a tendency for overfitting and for selecting more complex models even if simpler models pro- vide the better explanation of the considered cognitive ability (Pitt & Myung, 2002). Several methods taking into account model complexity have been proposed to avoid the pitfalls of the na¨ive ap- proach (see Shiffrin, Lee, Kim, & Wagenmakers, 2008, for an overview). However, common use of such more sophisti- cated model comparison methods is partly hampered by the fact that many properties of the different methods are in- sufficiently investigated. Only very few studies (e.g., Co- hen, Sanborn, & Shiffrin, 2008) have systematically exam- ined different comparison methods with respect to their dif- ferential advantages and disadvantages. Consequently, when faced with a situation that requires comparing models regard- ing their ability for simulating human behavior, modelers are often faced with the problem that it is unclear which model comparison methods could reasonably and should ideally be employed in a given situation. In this contribution we present the results of a systematic comparison of 5 model comparison methods. The methods are examined with respect to their ability to select the model that actually generated the data across 3 pairs of models and a number of contextual variations (e.g., tightness of fits, amount of noise in the data). The obtained results highlight impor- tant properties of the different comparison methods. Together with the fact that all 5 considered methods are general in the sense that they place no restrictions on the type of models that can be compared, these results are, we believe, conducive to increasing the frequency with which more sophisticated com- parison methods instead of the na¨ive approach will be em- ployed for model evaluation and comparison. The remainder of this article is structured as follows. First, we list and briefly describe all considered methods. Second, the employed models, contextual variations, and procedu- ral details of the method comparison are described. Subse- quently, comparison results are presented and discussed be- fore we conclude our considerations and highlight topics for future work. Methods The 5 methods we compared are the bootstrap, the bootstrap with standard error (SE) and confidence interval (CI), the data-uninformed parametric bootstrap cross-fitting method, henceforth called cross-fitting method (CM), the simple hold- out, and the prediction error difference method (PED). Each of these was applied to 3 pairs of models and will be described in turn below. Bootstrap Given a set of n observations, the bootstrap method of model comparison proceeds as follows (see Efron & Tibshirani, 1993, for an overview of bootstrapping procedures). First, an arbitrary but fixed number B of bootstrap samples is gen- erated. A bootstrap sample is a set of n data points ran- domly drawn with replacement from the n original obser- vations. Due to sampling with replacement, most bootstrap samples will contain only a subset of all original observa-

Holger Schultheis | Devendra Singh Chaplot | Ankit Singhaniya | Holger Schultheis | Ankit Singhaniya

[1] Holger Schultheis. Decision Criteria for Model Comparison Using Cross-Fitting , 2013 .

[2] R. Tibshirani,et al. Improvements on Cross-Validation: The 632+ Bootstrap Method , 1997 .

[3] I. J. Myung,et al. When a good fit can be bad , 2002, Trends in Cognitive Sciences.

[4] Michael D. Lee,et al. A Survey of Model Evaluation Approaches With a Tutorial on Hierarchical Bayesian Methods , 2008, Cogn. Sci..

[5] Roger Ratcliff,et al. Assessing model mimicry using the parametric bootstrap , 2004 .

[6] Wessel N van Wieringen,et al. Testing the prediction error difference between 2 predictors. , 2009, Biostatistics.

[7] Adam N. Sanborn,et al. Model evaluation using grouped or individual data , 2008, Psychonomic bulletin & review.

[8] J. Busemeyer,et al. Model Comparisons and Model Selections Based on Generalization Criterion Methodology. , 2000, Journal of mathematical psychology.

[9] H. Akaike,et al. Information Theory and an Extension of the Maximum Likelihood Principle , 1973 .

[10] M. Kenward,et al. An Introduction to the Bootstrap , 2007 .

[11] Neal Madras. Lectures on Monte Carlo Methods , 2002 .