HiDEx: The High Dimensional Explorer

INTRODUCTION In this chapter we present a new tool for exploring a class of models of lexical semantics derived from HAL (Hyperspace Analog to Language; Burgess, 1998; Burgess & Lund, 2000), a computational model of word meaning that derives semantic relationships from lexical co-occurrence. Although the original HAL model was well specified, it contains several parameters whose values were set without formal or empirical justification. Our freely available implementation of the class of HAL-derived models is called the High Dimensional Explorer (HiDEx). HiDEx allows users to systematically vary its defining parameters, creating models that are algorithmically identical, but parameterized differently. In this paper we will explain how HiDEx works, and how we have been able to use it to explore HAL's parameter space. ABSTRACT HAL (Hyperspace Analog to Language) is a high-dimensional model of semantic space that uses the global co-occurrence frequency of words in a large corpus of text as the basis for a representation of semantic memory. In the original HAL model, many parameters were set without any a priori rationale. In this chapter we describe a new computer application called the High Dimensional Explorer (HiDEx) that makes it possible to systematically alter the values of the model's parameters and thereby to examine their effect on the co-occurrence matrix that instantiates the model. New parameter sets give us measures of semantic density that improve the model's ability to predict behavioral measures. Implications for such models are discussed.

[1]  Curt Burgess,et al.  The effect of corpus size in predicting reaction time in a basic word recognition task: Moving on from Kučera and Francis , 1998 .

[2]  Raymond Y. K. Lau,et al.  Classifying Document Titles Based on Information Inference , 2003, ISMIS.

[3]  Lori Buchanan,et al.  WINDSOR: Windsor improved norms of distance and similarity of representations of semantics , 2008, Behavior research methods.

[4]  Curt Burgess,et al.  Modelling Parsing Constraints with High-dimensional Context Space , 1997 .

[5]  Thierry Bertin-Mahieux,et al.  Automatic Tagging of Audio: The State-of-the-Art , 2011 .

[6]  D. Balota,et al.  Automatic and attentional priming in young and older adults: reevaluation of the two-process model. , 1992, Journal of experimental psychology. Human perception and performance.

[7]  Cédric Sarré,et al.  Technology-Mediated Tasks in English for Specific Purposes (ESP): Design, Implementation and Learner Perception , 2013, Int. J. Comput. Assist. Lang. Learn. Teach..

[8]  Ruohua Zhou,et al.  Music Onset Detection , 2011 .

[9]  Mohammad A. Karim,et al.  Technical Challenges and Design Issues in Bangla Language Processing , 2013 .

[10]  R. H. Baayen,et al.  The CELEX Lexical Database (CD-ROM) , 1996 .

[11]  Lawrence Locker,et al.  Semantic and phonological influences on the processing of words and pseudohomophones , 2003, Memory & cognition.

[12]  T. Landauer,et al.  A Solution to Plato's Problem: The Latent Semantic Analysis Theory of Acquisition, Induction, and Representation of Knowledge. , 1997 .

[13]  Wenwu Wang,et al.  Machine Audition: Principles, Algorithms and Systems , 2010 .

[14]  László Gönczy,et al.  Ontology-Supported Design of Domain-Specific Languages: A Complex Event Processing Case Study , 2014 .

[15]  Michael N Jones,et al.  Representing word meaning and order information in a composite holographic lexicon. , 2007, Psychological review.

[16]  W. Kintsch,et al.  High-Dimensional Semantic Space Accounts of Priming. , 2006 .

[17]  Curt Burgess,et al.  Explorations in context space: Words, sentences, discourse , 1998 .

[18]  Curt Burgess,et al.  Producing high-dimensional semantic spaces from lexical co-occurrence , 1996 .

[19]  Hercules Dalianis,et al.  Applied Natural Language Processing: Identification, Investigation and Resolution , 2011 .

[20]  Cyrus Shaoul,et al.  Word frequency effects in high-dimensional co-occurrence models: A new approach , 2006, Behavior research methods.

[21]  Congcong Wang,et al.  Teachers' Experience as Foreign Language Online Learners: Developing Teachers' Linguistic, Cultural, and Technological Awareness , 2014 .

[22]  Helmer Strik,et al.  Second Language Learners’ Spoken Discourse: Practice and Corrective Feedback through Automatic Speech Recognition, in Innovative Methods and Technologies for Electronic Discourse Analysis , 2013 .

[23]  Curt Burgess,et al.  Characterizing semantic space: Neighborhood effects in word recognition , 2001, Psychonomic bulletin & review.

[24]  J. Bullinaria,et al.  Extracting semantic representations from word co-occurrence statistics: A computational study , 2007, Behavior research methods.

[25]  B. Murdock A Theory for the Storage and Retrieval of Item and Associative Information. , 1982 .

[26]  Curt Burgess,et al.  From simple associations to the building blocks of language: Modeling meaning in memory with the HAL model , 1998 .

[27]  Chris Westbury,et al.  The effect of semantic distance in yes/no and go/no-go semantic categorization tasks , 2003, Memory & cognition.

[28]  Alain Lifchitz,et al.  Effect of tuned parameters on an LSA multiple choice questions answering model , 2009, Behavior research methods.

[29]  Richard Cole,et al.  Concept learning and information inferencing on a high-dimensional semantic space , 2004 .

[30]  Lori Buchanan,et al.  Grounding co-occurrence: Identifying features in a lexical co-occurrence model of semantic memory , 2009, Behavior research methods.

[31]  Peter Bruza,et al.  Discovering information flow suing high dimensional conceptual space , 2001, SIGIR '01.