Extracting prototypes from exemplars What can corpus data tell us about concept representation?

Abstract Over the past four decades, two distinct alternatives have emerged to rule-based models of how linguistic categories are stored and represented as cognitive structures, namely the prototype and exemplar theories. Although these models were initially thought to be mutually exclusive, shifts from one mechanism to the other have been observed in category learning experiments, bringing the models closer together. In this paper we implement a technique akin to varying abstraction modelling, that assumes intermediate abstraction processes to underlie category representations and categorization decisions; we do so using familiar statistical techniques such as regression and clustering that track frequency distributions in input. With this model we simulate, on the basis of actual usage of Russian try verbs and Finnish think verbs as observed in corpora, how prototypes for near-synonymous verbs could be formed from concrete exemplars at different levels of abstraction. In so doing, we take a closer look at the cognitive linguistic flirtation with multiple categorization theories, suggesting three improvements anchored in the fact that cognitive linguistics is a usage-based theory of language. Firstly, we show that language provides support for considering single prototype and full exemplar models as opposite ends along a continuum of abstraction. Secondly, we present a methodology that simulates how prototypes can be obtained from exemplars at more than one level of abstraction in a systematic and verifiable way. And thirdly, we illustrate our claims on the basis of work on verbs, denoting intangible events that are neither stable in nor independent of time and express relational concepts; this implies that verbs are more susceptible to their meanings being influenced by the concepts they relate.

[1]  W. G. Cochran Some Methods for Strengthening the Common χ 2 Tests , 1954 .

[2]  Adam N. Sanborn,et al.  Unifying rational models of categorization via the hierarchical Dirichlet process , 2019 .

[3]  Graeme Hirst,et al.  Building a lexical knowledge-base of near-synonym differences , 2004 .

[4]  Douglas L. Medin,et al.  Context theory of classification learning. , 1978 .

[5]  T. Landauer,et al.  A Solution to Plato's Problem: The Latent Semantic Analysis Theory of Acquisition, Induction, and Representation of Knowledge. , 1997 .

[6]  H. Theil On the Estimation of Relationships Involving Qualitative Variables , 1970, American Journal of Sociology.

[7]  Y. Rosseel Mixture models of categorization , 2002 .

[8]  J. Katz,et al.  An integrated theory of linguistic descriptions , 1964 .

[9]  E. Steyerberg,et al.  [Regression modeling strategies]. , 2011, Revista espanola de cardiologia.

[10]  Freda Kemp Applied Multiple Regression/Correlation Analysis for the Behavioral Sciences , 2003 .

[11]  Ryan M. Rifkin,et al.  In Defense of One-Vs-All Classification , 2004, J. Mach. Learn. Res..

[12]  T. Shultz,et al.  Development of Prototype Abstraction and Exemplar Memorization , 2010 .

[13]  Dušica Filipović Đurđević,et al.  An amorphous model for morphological processing in visual comprehension based on naive discriminative learning. , 2011, Psychological review.

[14]  Tom Verguts,et al.  Beyond exemplars and prototypes as memory representations of natural concepts: A clustering approach☆ , 2007 .

[15]  Graeme Hirst,et al.  Building and Using a Lexical Knowledge Base of Near-Synonym Differences , 2006, Computational Linguistics.

[16]  R N Aslin,et al.  Statistical Learning by 8-Month-Old Infants , 1996, Science.

[17]  Nick C. Ellis,et al.  Construction Learning as a Function of Frequency, Frequency Distribution, and Function. , 2009 .

[18]  S. Menard Applied Logistic Regression Analysis , 1996 .

[19]  Antti Arppe,et al.  Univariate, bivariate, and multivariate methods in corpus-based lexicography : A study of synonymy , 2008 .

[20]  R. Harald Baayen,et al.  Corpus linguistics and naive discriminative learning , 2011 .

[21]  Stefan Th. Gries,et al.  Clusters in the mind?: Converging evidence from near synonymy in Russian , 2008 .

[22]  Stefan Th. Gries,et al.  Towards a corpus-based identification of prototypical instances of constructions , 2003 .

[23]  Adele E. Goldberg,et al.  Constructions at Work , 2005 .

[24]  Arie Verhagen,et al.  Constructions of intersubjectivity , 2005 .

[25]  D. Medin,et al.  SUSTAIN: a network model of category learning. , 2004, Psychological review.

[26]  Jonathan Harrington,et al.  An acoustic analysis of 'happy-tensing' in the Queen's Christmas broadcasts , 2006, J. Phonetics.

[27]  G. Murphy,et al.  The Big Book of Concepts , 2002 .

[28]  G. Lakoff,et al.  Women, Fire, and Dangerous Things: What Categories Reveal about the Mind , 1988 .

[29]  R. Harald Baayen,et al.  Predicting the dative alternation , 2007 .

[30]  Douglas L. Hintzman,et al.  "Schema Abstraction" in a Multiple-Trace Memory Model , 1986 .

[31]  J. Hawkins,et al.  On Intelligence , 2004 .

[32]  Stefan Th. Gries,et al.  Ways of trying in Russian: clustering behavioral profiles , 2006, Corpus Linguistics and Linguistic Theory.

[33]  Wolf Vanpaemel,et al.  Abstraction and model evaluation in category learning , 2010, Behavior research methods.

[34]  Rens Bod,et al.  From Exemplar to Grammar: A Probabilistic Analogy-Based Model of Language Learning , 2009, Cogn. Sci..

[35]  David Sankoff,et al.  Probability and linguistic variation , 1978, Synthese.

[36]  John R. Taylor,et al.  语言的范畴化:语言学理论中的类典型 = Linguistic categorization : prototypes in linguistic theory , 1989 .

[37]  Janet B. Pierrehumbert,et al.  Exemplar dynamics: Word frequency, lenition and contrast , 2000 .

[38]  R. H. Baayen,et al.  Storage and computation in the mental lexicon , 2005 .

[39]  Stefan Thomas Gries,et al.  Multifactorial Analysis in Corpus Linguistics: A Study of Particle Placement , 2003 .

[40]  Antti Arppe Linguistic choices vs. probabilities – how much and what can linguistic theory explain? , 2009 .

[41]  G. Lakoff Women, fire, and dangerous things : what categories reveal about the mind , 1989 .

[42]  R. O’Brien,et al.  A Caution Regarding Rules of Thumb for Variance Inflation Factors , 2007 .

[43]  How not to disagree: The emergence of structure from usage , 2010 .

[44]  Hans C. Jessen,et al.  Applied Logistic Regression Analysis , 1996 .

[45]  Dagmar Divjak,et al.  Structuring the Lexicon: A Clustered Model for Near-Synonymy , 2010 .

[46]  Frank E. Harrell,et al.  Regression Modeling Strategies: With Applications to Linear Models, Logistic Regression, and Survival Analysis , 2001 .

[47]  R. Nosofsky Attention, similarity, and the identification-categorization relationship. , 1986, Journal of experimental psychology. General.

[48]  Joan L. Bybee,et al.  Language, Usage and Cognition , 2010 .

[49]  E. Dąbrowska The effects of frequency and neighbourhood density on adult speakers’ productivity with Polish case inflections: an empirical test of usage-based approaches to morphology , 2008 .

[50]  S. Pulman Word Meaning and Belief , 1983 .

[51]  R. Nosofsky Exemplars, prototypes, and similarity rules. , 1992 .

[52]  Stefan Thomas Gries,et al.  Statistics for linguistics with R: A practical introduction (review) , 2012 .

[53]  W. Vanpaemel,et al.  In search of abstraction: The varying abstraction model of categorization , 2008, Psychonomic bulletin & review.

[54]  A. Goldberg Constructions at Work: The Nature of Generalization in Language , 2006 .

[55]  Nick C. Ellis,et al.  Constructions and their acquisition: Islands and the distinctiveness of their occupancy , 2009 .

[56]  R. Harald Baayen,et al.  Statistical classification and principles of human learning , 2011 .

[57]  Susan Carey,et al.  Acquiring a Single New Word , 1978 .

[58]  K. Pearson On the χ 2 Test of Goodness of Fit , 1922 .

[59]  Timo Järvinen,et al.  A non-projective dependency parser , 1997, ANLP.

[60]  L. Barsalou On the indistinguishability of exemplar memory and abstraction in category representation , 1990 .

[61]  Emmerich Kelih,et al.  Quantitative methods in linguistics , 2010, J. Quant. Linguistics.

[62]  William K. Estes,et al.  Classification and cognition , 1994 .

[63]  D. Darling,et al.  A Test of Goodness of Fit , 1954 .