Exploring lexical co-occurrence space using HiDEx

Hyperspace analog to language (HAL) is a high-dimensional model of semantic space that uses the global co-occurrence frequency of words in a large corpus of text as the basis for a representation of semantic memory. In the original HAL model, many parameters were set without any a priori rationale. We have created and publicly released a computer application, the High Dimensional Explorer (HiDEx), that makes it possible to systematically alter the values of these parameters to examine their effect on the co-occurrence matrix that instantiates the model. We took an empirical approach to understanding the influence of the parameters on the measures produced by the models, looking at how well matrices derived with different parameters could predict human reaction times in lexical decision and semantic decision tasks. New parameter sets give us measures of semantic density that improve the model’s ability to predict behavioral measures. Implications for such models are discussed.

[1]  Lori Buchanan,et al.  WINDSOR: Windsor improved norms of distance and similarity of representations of semantics , 2008, Behavior research methods.

[2]  George S. Cree,et al.  Distinctive features hold a privileged status in the computation of word meaning: Implications for theories of semantic memory. , 2006, Journal of experimental psychology. Learning, memory, and cognition.

[3]  S. Lupker,et al.  Semantic ambiguity and the process of generating meaning from print. , 2004, Journal of experimental psychology. Learning, memory, and cognition.

[4]  L. Tyler,et al.  Investigating semantic memory impairments: the contribution of semantic priming. , 1995, Memory.

[5]  George Kingsley Zipf,et al.  Human behavior and the principle of least effort , 1949 .

[6]  Curt Burgess,et al.  The effect of corpus size in predicting reaction time in a basic word recognition task: Moving on from Kučera and Francis , 1998 .

[7]  Geoff Hollis,et al.  NUANCE 3.0: Using genetic programming to model variable relationships , 2006, Behavior research methods.

[8]  Barbara Chapman,et al.  Using OpenMP - portable shared memory parallel programming , 2007, Scientific and engineering computation.

[9]  Jenny A. Fristrup Usenet : netnews for everyone , 1994 .

[10]  Curt Burgess,et al.  Explorations in context space: Words, sentences, discourse , 1998 .

[11]  Curt Burgess,et al.  Producing high-dimensional semantic spaces from lexical co-occurrence , 1996 .

[12]  Cyrus Shaoul,et al.  Word frequency effects in high-dimensional co-occurrence models: A new approach , 2006, Behavior research methods.

[13]  A. Baddeley Working memory and language: an overview. , 2003, Journal of communication disorders.

[14]  Curt Burgess,et al.  Characterizing semantic space: Neighborhood effects in word recognition , 2001, Psychonomic bulletin & review.

[15]  J. Bullinaria,et al.  Extracting semantic representations from word co-occurrence statistics: A computational study , 2007, Behavior research methods.

[16]  G. Zipf The Psycho-Biology Of Language: AN INTRODUCTION TO DYNAMIC PHILOLOGY , 1999 .

[17]  H. Akaike A new look at the statistical model identification , 1974 .

[18]  Raymond Y. K. Lau,et al.  Classifying Document Titles Based on Information Inference , 2003, ISMIS.

[19]  Curt Burgess,et al.  Modelling Parsing Constraints with High-dimensional Context Space , 1997 .

[20]  Richard Cole,et al.  Concept learning and information inferencing on a high-dimensional semantic space , 2004 .

[21]  W. Kintsch,et al.  High-Dimensional Semantic Space Accounts of Priming. , 2006 .

[22]  Michael N Jones,et al.  Representing word meaning and order information in a composite holographic lexicon. , 2007, Psychological review.

[23]  D. Balota,et al.  Automatic and attentional priming in young and older adults: reevaluation of the two-process model. , 1992, Journal of experimental psychology. Human perception and performance.

[24]  Lawrence Locker,et al.  Semantic and phonological influences on the processing of words and pseudohomophones , 2003, Memory & cognition.

[25]  Gabriel Recchia,et al.  More data trumps smarter algorithms: Comparing pointwise mutual information with latent semantic analysis , 2009, Behavior research methods.

[26]  Philip E. B. Jourdain The Study of Mathematics , 1908 .

[27]  T. Landauer,et al.  A Solution to Plato's Problem: The Latent Semantic Analysis Theory of Acquisition, Induction, and Representation of Knowledge. , 1997 .

[28]  Daniel Mirman,et al.  Attractor Dynamics and Semantic Neighborhood Density: Processing Is Slowed by near Neighbors and Speeded by Distant Neighbors We Thank Ann Kulikowski for Her Help with Data Collection And , 2022 .

[29]  David A. Medler,et al.  Distinct Brain Systems for Processing Concrete and Abstract Concepts , 2005, Journal of Cognitive Neuroscience.

[30]  B. Murdock A Theory for the Storage and Retrieval of Item and Associative Information. , 1982 .

[31]  Curt Burgess,et al.  From simple associations to the building blocks of language: Modeling meaning in memory with the HAL model , 1998 .

[32]  Chris Westbury,et al.  The effect of semantic distance in yes/no and go/no-go semantic categorization tasks , 2003, Memory & cognition.

[33]  Curt Burgess,et al.  The Dynamics of Meaning in Memory , 1998 .

[34]  Alain Lifchitz,et al.  Effect of tuned parameters on an LSA multiple choice questions answering model , 2009, Behavior research methods.

[35]  Lori Buchanan,et al.  Grounding co-occurrence: Identifying features in a lexical co-occurrence model of semantic memory , 2009, Behavior research methods.

[36]  R. Harald Baayen,et al.  Word Frequency Distributions , 2001 .

[37]  Peter Bruza,et al.  Discovering information flow suing high dimensional conceptual space , 2001, SIGIR '01.