Word Lookup on the Basis of Associations : from an Idea to a Roadmap

Word access is an obligatory step in language production. In order to achieve his communicative goal, a speaker/writer needs not only to have something to say, he must also find the corresponding word(s). Yet, knowing a word, i.e. having it stored in a data-base or memory (human mind or electronic device) does not imply that one is able to access it in time. This is a clearly a case where computers (electronic dictionaries) can be of great help. In this paper we present our ideas of how an enhanced electronic dictionary can help people to find the word they are looking for. The yet-to-be-built resource is based on the age-old notion of association: every idea, concept or word is connected. In other words, we assume that people have a highly connected conceptuallexical network in their mind. Finding a word amounts thus to entering the network at any point by giving the word or concept coming to their mind (source word) and then following the links (associations) leading to the word they are looking for(target word). Obviously, in order to allow for this kind of access, the resource has to be built accordingly. This requires at least two things: (a) indexing words by the associations they evoke, (b) identification and labeling of the most frequent/useful associations. This is precisely our goal. Actually, we propose to build an associative network by enriching an existing electronic dictionary (essentially) with (syntagmatic) associations coming from a corpus, representing the average citizen's shared, basic knowledge of the world (encyclopedia). Such an enhanced electronic database resembles in many respects our mental dictionary. Combining the power of computers and the flexibility of the human mind (omnidirectional navigation and quick jumps), it emulates to some extent the latter in its capacity to navigate quickly and efficiently in a large data base. While the notions of association and spreading activation are fairly old, their use to support word access via computer is new. The resource still needs to be built, and this is not a trivial task. We discuss here some of the strategies and problems involved in accomplishing it with the help of people and computers (automation).

[1]  T. Tokunaga,et al.  Dictionary search based on the target word description , 2004 .

[2]  James J. Jenkins,et al.  THE 1952 MINNESOTA WORD ASSOCIATION NORMS , 1970 .

[3]  Michael Zock,et al.  Sorry, What Was Your Name Again, or How to Overcome the tip-of-the tongue Problem with the Help of a Computer? , 2002, COLING 2002.

[4]  Ted Dunning,et al.  Accurate Methods for the Statistics of Surprise and Coincidence , 1993, CL.

[5]  Reinhard Rapp,et al.  Computation of Word Associations Based on Co-occurrences of Words in Large Corpora , 1993, VLC@ACL.

[6]  Vannevar Bush,et al.  As we may think , 1945, INTR.

[7]  George A. Miller,et al.  Introduction to WordNet: An On-line Lexical Database , 1990 .

[8]  Gary S. Dell,et al.  Connectionist models of language production: lexical access and grammatical encoding , 1999, Cogn. Sci..

[9]  Frank Smadja,et al.  Retrieving Collocations from Text: Xtract , 1993, CL.

[10]  Olivier Ferret,et al.  Using Collocations for Topic Segmentation and Link Detection , 2002, COLING.

[11]  Mark Stevenson Augmenting Noun Taxonomies by Combining Lexical Similarity Metrics , 2002, COLING.

[12]  Graeme Hirst,et al.  Acquiring Collocations for Lexical Choice between Near-Synonyms , 2002, Proceedings of the ACL-02 workshop on Unsupervised lexical acquisition -.

[13]  Yoav Ben-Shlomo,et al.  The Mind within the Net: Models of Learning, Thinking and Acting , 2000, BMJ : British Medical Journal.

[14]  Stefan Evert,et al.  Methods for the Qualitative Evaluation of Lexical Association Measures , 2001, ACL.

[15]  Makoto Nagao,et al.  A New Method of N-gram Statistics for Large Number of n and Automatic Extraction of Words and Phrases from Large Text Data of Japanese , 1994, COLING.

[16]  Kenneth Ward Church,et al.  Using Statistics in Lexical Analysis , 2003, Lexical Acquisition: Exploiting On-Line Resources to Build a Lexicon.

[17]  Uri Zernik,et al.  Lexical acquisition: Exploiting on-line resources to build a lexicon. , 1991 .

[18]  Igor Mel’čuk,et al.  Dictionnaire explicatif et combinatoire du français contemporain. Recherches lexico-sémantiques IV: Recherches lexico-sémantiques IV , 1999 .

[19]  Hinrich Schütze,et al.  Book Reviews: Foundations of Statistical Natural Language Processing , 1999, CL.

[20]  J. Deese The structure of associations in language and thought , 1966 .

[21]  H. Chertkow,et al.  Semantic memory , 2002, Current neurology and neuroscience reports.

[22]  Roger W. Schvaneveldt,et al.  Pathfinder associative networks: studies in knowledge organization , 1990 .

[23]  Darren Pearce,et al.  Synonymy in collocation extraction , 2001 .

[24]  S. Freud The Psychopathology of Everyday Life , 1915 .

[25]  Olatz Ansa,et al.  Enriching very large ontologies using the WWW , 2000, ECAI Workshop on Ontology Learning.

[26]  Kenneth Ward Church,et al.  Word Association Norms, Mutual Information, and Lexicography , 1989, ACL.

[27]  Ted Pedersen,et al.  Fishing for Exactness , 1996, ArXiv.

[28]  SmadjaFrank Retrieving collocations from text , 1993 .