Towards a Categorization-Based Model of Similarity

Towards a Categorization-Based Model of Similarity Steven Verheyen (steven.verheyen@psy.kuleuven.be) Gert Storms (gert.storms@psy.kuleuven.be) Department of Psychology, University of Leuven Tiensestraat 102, B-3000 Leuven, Belgium Abstract study categorization in natural language categories. The list of items that is presented for categorization is generally a mix of clear members, borderline members, and clear nonmem- bers of the target category. However, even the clear non- members of the target category in these tasks are chosen to be at least somewhat related to the category’s most prototyp- ical items. For instance, the nonmembers included for cate- gorization in a category like vegetables tend to be comprised of other food or plant items. Employing items from the an- imals or artifacts domains instead, would presumably render the task less ecological valid and (perhaps more to the point) might detract from the similarity-based processes we intend to study with these tasks. Importantly, when all the items that make up a set of potential category members are drawn from a single domain (be it animals, artifacts, foods, activities, ...) it is likely that meaningful similarity relations exist among them. These similarity relations might impose structure on the corresponding categorization decisions that have been ne- glected in treating these decisions as independent from one another. For instance, in categorizing items as vegetables or not, one can imagine participants consistently giving the same response to parsley as to sage when their similarity as herbs is recognized. In what follows we will employ various methods to estab- lish that the data that result from the traditional categorization task violate the assumption of independence. These meth- ods originate from the item response models literature, where departures from independence are known as Local Item De- pendencies (LIDs). Rather than considering LIDs as nui- sances that one is better off eliminating (which is common practice in the item response models literature), we will re- late the LIDs to ratings of item-item similarity to argue that they are a substantial part of categorization decisions which future accounts of categorization will have to incorporate. The case for the existence and importance of LIDs in cate- gorization will be made by means of a reanalysis of previ- ously published categorization data using the Rasch model (Rasch, 1960). The reasons for using the Rasch model to introduce one of the shortcomings of many current catego- rization accounts are threefold. (i) The model naturally ac- counts for the inter-individual differences in categorization that are characteristic of the natural language categories we study (Verheyen, Hampton, & Storms, 2010). (ii) Since the Rasch model is an item response model it is straightforward to apply existing methods for detecting LIDs to it. (iii) A Rasch-like model that accommodates the need to incorporate LIDs offers the intriguing possibility of deriving similarity from categorization, instead of the other way around. Most accounts of categorization assume the categorization de- cision for an item to be independent of the categorization de- cisions for other items. A number of methods are brought to bear on the question of whether this assumption is justified. These methods involve the application of a formal categoriza- tion model that explicitly incorporates the independence as- sumption to categorization data and the subsequent investiga- tion of the residuals for unexplained structure. The residuals reveal multiple departures from independence, suggesting that the independence assumption in many a categorization account should be relaxed. Following this suggestion the applied for- mal model is extended to allow for dependent categorization decisions. It is explained how the extended model might ad- dress the concern that categorization accounts have erred in using similarity as an explanatory construct. It promises to be a significant step towards a categorization-based model of sim- ilarity. Keywords: categorization; similarity; threshold theory. Introduction Similarity is arguably the variable that is most often invoked to explain categorization decisions. According to most ac- counts of categorization an item is believed to be a category member if its representation sufficiently resembles the cat- egory representation (regardless of whether the latter is be- lieved to be an abstracted summary representation, an instan- tiated set of representative exemplars, an ideal, or a coher- ent theory). The Threshold Theory of categorization, for in- stance, posits that prior to making a categorization decision the similarity between the item’s representation and the cate- gory’s representation is compared against an internal thresh- old (Hampton, 2007). If the assessed similarity exceeds the threshold, the item will be endorsed as a category member; otherwise it will not. Most accounts of categorization will make the additional assumption that consecutive categorization decisions are made independently from one another. It is believed that ev- ery new item that is encountered for categorization will in- voke the same similarity-assessment procedure that earlier items have. That is, participants will provide a categorization decision by determining whether the new item’s representa- tion sufficiently resembles the category’s representation. The answer that is provided on this particular categorization trial is thus believed to be arrived at independently from the de- cisions that were made earlier (and the ones that await). In the framework of the Threshold Theory, for instance, every new categorization decision entails a comparison of the item- category similarity against the internal threshold, without re- gard of the decisions for alternate items. This assumption of independence might prove too strong, particularly in the context of the tasks that are employed to

[1]  J. D. Smith,et al.  Prototypes in the Mist: The Early Epochs of Category Learning , 1998 .

[2]  Benjamin D. Wright,et al.  Solving measurement problems with the Rasch model. , 1977 .

[3]  Georg Rasch,et al.  Probabilistic Models for Some Intelligence and Attainment Tests , 1981, The SAGE Encyclopedia of Research Design.

[4]  D. Andrich,et al.  Quantifying Response Dependence Between Two Dichotomous Items Using the Rasch Model , 2010 .

[5]  L. Rips Inductive judgments about natural categories. , 1975 .

[6]  Steven Verheyen,et al.  A probabilistic threshold model: analyzing semantic categorization data with the Rasch model. , 2010, Acta psychologica.

[7]  P. Boeck,et al.  Explanatory item response models : a generalized linear and nonlinear approach , 2004 .

[8]  D. Rumelhart,et al.  A model for analogical reasoning. , 1973 .

[9]  Andrew Thomas,et al.  WinBUGS - A Bayesian modelling framework: Concepts, structure, and extensibility , 2000, Stat. Comput..

[10]  James A. Hampton,et al.  Typicality, Graded Membership, and Vagueness , 2007, Cogn. Sci..

[11]  D. Medin,et al.  The role of theories in conceptual coherence. , 1985, Psychological review.

[12]  Francis Tuerlinckx,et al.  Models for residual dependencies , 2004 .

[13]  R. Nosofsky American Psychological Association, Inc. Choice, Similarity, and the Context Theory of Classification , 2022 .

[14]  P. Groenen,et al.  Modern multidimensional scaling , 1996 .

[15]  Steven Verheyen,et al.  Determining the dimensionality in spatial representations of semantic concepts , 2007, Behavior research methods.

[16]  Alexander Bird,et al.  Natural Kinds , 1988, Philosophy.