Testing the Distributional Hypothesis: The Influence of Context on Judgements of Semantic Similarity Scott McDonald (scottm@cogsci.ed.ac.uk) Michael Ramscar (michael@cogsci.ed.ac.uk) Institute for Communicating and Collaborative Systems, University of Edinburgh 2 Buccleuch Place, Edinburgh EH8 9LW Scotland Abstract Distributional information has recently been implicated as playing an important role in several aspects of lan- guage ability. Learning the meaning of a word is thought to be dependent, at least in part, on exposure to the word in its linguistic contexts of use. In two experiments, we manipulated subjects’ contextual experience with mar- ginally familiar and nonce words. Results showed that similarity judgements involving these words were af- fected by the distributional properties of the contexts i n which they were read. The accrual of contextual experi- ence was simulated in a semantic space model, by succes- sively adding larger amounts of experience in the form of item-in-context exemplars sampled from the British National Corpus. The experiments and the simulation provide support for the role of distributional information in developing representations of word meaning. The Distributional Hypothesis The basic human ability of language understanding – mak- ing sense of another person’s utterances – does not develop in isolation from the environment. There is a growing body of research suggesting that distributional information plays a more powerful role than previously thought in a number of aspects of language processing. The exploita- tion of statistical regularities in the linguistic environment has been put forward to explain how language learners accomplish tasks from segmenting speech to bootstrap- ping word meaning. For example, Saffran, Aslin and Newport (1996) have demonstrated that infants are highly sensitive to simple conditional probability statistics, indicating how the ability to segment the speech stream into words may be realised. Adults, when faced with the task of identifying the word boundaries in an artificial language, also appear able to readily exploit such statistics (Saffran, Newport & Aslin, 1996). Redington, Chater and Finch (1998) have proposed that distributional information may contribute to the acquisition of syntactic knowledge by children. Useful information about the similarities and differences in the meaning of words has also been shown to be present in simple distributional statistics (e.g., Landauer & Dumais, 1997; McDonald, 2000). Based on the convergence of these recent studies into a cognitive role for distributional information in explaining language ability, we call the general principle under exploration the Distributional Hypothesis. The purpose of the present paper is to further test the distributional hypothesis, by examining the influence of context on similarity judgements involving marginally familiar and novel words. Our investigations are framed under the ‘semantic space’ approach to representing word meaning, to which we turn next. Distributional Models of Word Meaning The distributional hypothesis has provided the motivation for a class of objective statistical methods for representing meaning. Although the surge of interest in the approach arose in the fields of computational linguistics and infor- mation retrieval (e.g., Schutze, 1998; Grefenstette, 1994), where large-scale models of lexical semantics are crucial for tasks such as word sense disambiguation, high- dimensional ‘semantic space’ models are also useful tools for investigating how the brain represents the meaning of words. Word meaning can be considered to vary along many dimensions; semantic space models attempt to capture this variation in a coherent way, by positioning words in a geometric space. How to determine what the crucial dimensions are has been a long-standing problem; a recent and fruitful approach to this issue has been to label the dimensions of semantic space with words. A word is located in the space according to the degree to which it co- occurs with each of the words labelling the dimensions of the space. Co-occurrence frequency information is extracted from a record of language experience – a large corpus of natural language. Using this approach, two words that tend to occur in similar linguistic contexts – that is, they are distributionally similar – will be positioned closer together in semantic space than two words which are not as distributionally similar. Such simple distributional knowledge has been implicated in a variety of language processing behaviours, such as lexical priming (e.g., Lowe & McDonald, 2000; Lund, Burgess & Atchley, 1995; McDonald & Lowe, 1998), synonym selection (Landauer & Dumais, 1997), retrieval in analogical reason- ing (Ramscar & Yarlett, 2000) and judgements of semantic similarity (McDonald, 2000). Contextual co-occurrence, the fundamental relationship underlying the success of the semantic space approach to representing word meaning, can be defined in a number of ways. Perhaps the simplest (and the approach taken in the majority of the studies cited above) is to define co- occurrence in terms of a ‘context window’: the co-occur-
[1]
C. Osgood,et al.
The Measurement of Meaning
,
1958
.
[2]
J. M. Kittross.
The measurement of meaning
,
1959
.
[3]
L. Barsalou.
Context-independent and context-dependent information in concepts
,
1982,
Memory & cognition.
[4]
D. Carnine.
Utilization of Contextual Information in Determining the Meaning of Unfamiliar Words.
,
1984
.
[5]
Kenneth Ward Church,et al.
Word Association Norms, Mutual Information, and Lexicography
,
1989,
ACL.
[6]
Alan Agresti,et al.
Categorical Data Analysis
,
1991,
International Encyclopedia of Statistical Science.
[7]
D. Gentner,et al.
Respects for similarity
,
1993
.
[8]
Gregory Grefenstette,et al.
Explorations in automatic thesaurus discovery
,
1994
.
[9]
M. Goldsmith,et al.
Statistical Learning by 8-Month-Old Infants
,
1996
.
[10]
E. Newport,et al.
WORD SEGMENTATION : THE ROLE OF DISTRIBUTIONAL CUES
,
1996
.
[11]
Malti Patel,et al.
Extracting Semantic Representations from Large Text Corpora
,
1997,
NCPW.
[12]
T. Landauer,et al.
A Solution to Plato's Problem: The Latent Semantic Analysis Theory of Acquisition, Induction, and Representation of Knowledge.
,
1997
.
[13]
R. Chaffin,et al.
Associations to unfamiliar words: Learning the meanings of new words
,
1997,
Memory & cognition.
[14]
Hinrich Schütze,et al.
Automatic Word Sense Discrimination
,
1998,
Comput. Linguistics.
[15]
W. Lowe,et al.
Modelling functional priming and the associative boost
,
1998
.
[16]
Nick Chater,et al.
Distributional Information: A Powerful Cue for Acquiring Syntactic Categories
,
1998,
Cogn. Sci..
[17]
W. Lowe,et al.
The Direct Route: Mediated Priming in Semantic Space
,
2000
.
[18]
Scott A. McDonald,et al.
Environmental Determinants of Lexical Processing Effort
,
2000
.
[19]
Daniel Yarlett,et al.
The Use of a High-dimensional, "Environmental" Context Space to Model retrieval in Analogy and Similarity-Based Transfer
,
2000
.
[20]
Hinrich Schütze,et al.
Book Reviews: Foundations of Statistical Natural Language Processing
,
1999,
CL.