In Defense of Spatial Models of Lexical Semantics Michael N. Jones, Thomas M. Gruenenfelder, & Gabriel Recchia [jonesmn][ tgruenen][grecchia]@indiana.edu Department of Psychological and Brain Sciences Indiana University, Bloomington, Indiana USA Abstract Semantic space models of lexical semantics learn vector representations for words by observing statistical redundancies in a text corpus. A word’s meaning is represented as a point in a high-dimensional semantic space. However, these spatial models have difficulty simulating human free association data due to the constraints placed upon them by metric axioms which appear to be violated in association norms. Here, we build on work by Griffiths, Steyvers, and Tenenbaum (2007) and test the ability of spatial semantic models to simulate association data when they are fused with a Luce choice rule to simulate the process of selecting a response in free association. The results provide an existence proof that spatial models can produce the patterns of data in free association previously thought to be problematic. Keywords: Semantic space model; latent semantic analysis; semantic networks; word association; metric axioms. 1. Introduction A longstanding belief in theories of lexical semantics (dating back at least to Osgood, 1952) is that words can be represented as points in a multidimensional semantic space. Similarity between words is then defined as some function of their distance in space. This classic notion of mental space has had an obvious impact on modern computational semantic space models, such as Latent Semantic Analysis (LSA; Landuaer & Dumais, 1997). Models such as LSA borrow techniques from linear algebra to infer the semantic representation for words from their contextual co- occurrences in linguistic corpora. In the resulting space, a word’s meaning is represented by a vector over latent dimensions. Inter-word similarity is based on Euclidean geometry: Words that are more similar are more proximal in the learned space. In contrast to spatial models, the recent popularity of probabilistic models of cognition has led to the development of Bayesian models of semantic representation, such as the LDA-based Topic model of Griffiths, Steyvers, and Tenenbaum (2007). In the Topic model, a word’s representation is a probability distribution over latent semantic “topics.” Given that LSA and the Topic model provide similar quantitative accounts of many semantic tasks, a popular misconception is that the models are isomorphic and that the Topic model is simply a more modern and generative version of LSA. However, the issue of whether humans represent meaning as a coordinate in space or as a conditional probability is a fundamental question in cognitive science, and has implications for downstream models that make use of these representations. Tversky (1977) has noted that spatial models must respect several metric axioms. Firstly, in a metric space the distance between a point and itself must be zero by any Euclidean metric, � �, � = 0 (non-negativity). Secondly, distance must respect symmetry: � �, � = � �, � . Thirdly, distance must respect the triangle inequality: If x and y are proximal and y and z are proximal, then x and z are likely to be proximal points as well (specifically, � �, � ≤ � �, � + � �, � ). As Tversky & Gati (1982) have demonstrated, human judgments of similarity routinely violate these axioms (specifically, symmetry and the triangle inequality). Tversky used human violations of the metric axioms to argue against spatial models of similarity, and instead proposed an additive feature comparison model. The spatial debate, however, has a long history in cognitive science, with Tversky’s work being followed by explanations of how metric spaces could produce violations of metric axioms (e.g., Krumhansl’s (1978) notion of density or Holman’s (1979) similarity and bias model). Griffiths et al. (2007) note that word association norms also violate metric axioms, making them problematic for semantic space models such as LSA. Probabilistic representations, however, are not subject to the same metric restrictions as spatial representations, and Griffiths et al. provide an elegant demonstration of how their Topic model can naturally account for the qualitative nature of these violations that LSA cannot. Word association norms contain a significant number of asymmetric associations: For example, the probability of generating baby as a response to stork as a cue is much greater than the reverse. Part of this effect is due to a bias to respond with a high frequency target independent of the cue, but part appears to be due to some sort of asymmetry in similarity. In addition, word association norms contain apparent violations of the triangle inequality axiom: To use the example from Griffiths et al. (2007), asteroid is strongly associated with belt, and belt is strongly associated with buckle, but asteroid and buckle have little association. Finally, Steyvers and Tenenbaum (2005) demonstrate that association norms contain neighborhood structure that is incompatible with spatial models. If one constructs an associative network with nodes representing words and connecting edges based on nonzero association probabilities, the resulting networks are scale-free: they have power law degree distributions and high clustering coefficients. Griffiths et al. demonstrate that while LSA (based on a thresholded cosine) cannot reproduce this
[1]
Gabriel Recchia,et al.
More data trumps smarter algorithms: Comparing pointwise mutual information with latent semantic analysis
,
2009,
Behavior research methods.
[2]
R. Nosofsky.
Attention, similarity, and the identification-categorization relationship.
,
1986
.
[3]
T. Landauer,et al.
A Solution to Plato's Problem: The Latent Semantic Analysis Theory of Acquisition, Induction, and Representation of Knowledge.
,
1997
.
[4]
A. Tversky.
Features of Similarity
,
1977
.
[5]
Joshua B. Tenenbaum,et al.
The Large-Scale Structure of Semantic Networks: Statistical Analyses and a Model of Semantic Growth
,
2001,
Cogn. Sci..
[6]
R. Duncan Luce,et al.
Individual Choice Behavior: A Theoretical Analysis
,
1979
.
[7]
C. Krumhansl.
Concerning the Applicability of Geometric Models to Similarity Data : The Interrelationship Between Similarity and Spatial Density
,
2005
.
[8]
R. Nosofsky.
Attention, similarity, and the identification-categorization relationship.
,
1986,
Journal of experimental psychology. General.
[9]
Michael N Jones,et al.
Representing word meaning and order information in a composite holographic lexicon.
,
2007,
Psychological review.
[10]
W. Maki,et al.
Latent structure in measures of associative, semantic, and thematic knowledge
,
2008,
Psychonomic bulletin & review.
[11]
W. K Estes,et al.
Some targets for mathematical psychology
,
1975
.
[12]
Mark Steyvers,et al.
Topics in semantic representation.
,
2007,
Psychological review.
[13]
E. Holman.
Monotonic models for asymmetric proximities
,
1979
.
[14]
R. Nosofsky.
Stimulus bias, asymmetric similarity, and classification
,
1991,
Cognitive Psychology.
[15]
A. Tversky,et al.
Similarity, separability, and the triangle inequality.
,
1982,
Psychological review.
[16]
R. Luce,et al.
Individual Choice Behavior: A Theoretical Analysis.
,
1960
.
[17]
C. Osgood.
The nature and measurement of meaning.
,
1952,
Psychological bulletin.
[18]
R. Shepard,et al.
Toward a universal law of generalization for psychological science.
,
1987,
Science.
[19]
Pentti Kanerva,et al.
Hyperdimensional Computing: An Introduction to Computing in Distributed Representation with High-Dimensional Random Vectors
,
2009,
Cognitive Computation.