Measuring Verb Similarity - eScholarship

Measuring Verb Similarity Philip Resnik and Mona Diab Department of Linguistics and Institute for Advanced Computer Studies University of Maryland College Park, MD USA f resnik,mdiab g @umiacs.umd.edu Abstract The way we model semantic similarity is closely tied to our understanding of linguistic representations. We present several models of semantic similarity, based on diering representational assumptions, and investigate their properties via comparison with human ratings of verb similarity. The results oer insight into the bases for human similarity judgments and provide a testbed for further investigation of the interactions among syn- tactic properties, semantic structure, and semantic con- tent. Introduction The way we model semantic similarity is closely tied to our understanding of how linguistic representations are acquired and used. Some models of similarity, such as Tversky's (1977), assume an explicit set of features over which a similarity measure can be computed, and re- cent computational methods for measuring word similar- ity can be thought of as an update of this idea on a large scale, representing words in terms of distributional fea- tures acquired via analysis of text corpora (e.g., Brown, Della Pietra, deSouza, Lai, & Mercer, 1992; Schutze, 1993). Other methods, following in the semantic net- works tradition of Quillian (1968), focus less on explicit features and more on relationships among lexical items within a conceptual taxonomy, sometimes going beyond taxonomic relationships to also take advantage of fre- quency information derived from corpora (e.g., Rada, Mili, Bicknell, & Blettner, 1989; Resnik, 1999). Although some of these approaches are not explicitly designed as cognitive models, we have proposed that pre- diction of human similarity can provide a useful point of comparison for computational measures of similarity, noting that one must be aware that such comparisons can be quite sensitive to the specic choice of test items (Resnik, 1999). To date, we are only aware of compar- isons having been done using noun similarity. In this paper, we consider the problem of measuring the semantic similarity of verbs. Verb similarity is in many respects a dierent problem from noun similar- ity, because verb representations are generally viewed as possessing properties that nouns do not, such as syn- tactic subcategorization restrictions, selectional prefer- ences, and event structure, and there are dependencies among these properties. 1 This means that particular Admittedly, the relevant contrast may turn out not to care must be taken in selecting items, as discussed below, and it also means that the same computational measures may be capturing dierent properties for verbs than for nouns. For example, the is-a relationship in WordNet's verb taxonomy (Fellbaum, 1998), central in the compu- tation of some measures, signies generalization accord- ing to manner, as in devour is-a eat ; concomitantly, the verb taxonomy is considerably wider and shallower than WordNet's noun taxonomy. Similarly, measures based on syntactic dependencies may be sensitive to syntactic adjuncts, such as locative and temporal modiers, that occur predominantly with verbs rather than with nouns. In what follows, we rst discuss several dierent mea- sures of word similarity and their properties. We then describe an experiment designed to obtain human sim- ilarity ratings for pairs of verbs, discuss the t of the alternative measures to the human ratings, and suggest some implications of these results for future work. Models of Verb Similarity We consider three classes of similarity measure, corre- sponding to three kinds of lexical representation. In the rst, verbs are associated with nodes in a semantic net- work. In the second, verbs are represented by distri- butional syntactic co-occurrence features obtained via analysis of a corpus. In the third, verbs are associated with lexical entries represented according to a theory of lexical conceptual structure. These classes of represen- tation can be viewed as occupying three dierent points on the spectrum from non-syntactic to syntactically rel- evant facets of verb meaning. Taxonomic Models Taxonomic models of lexical and conceptual knowledge have a long history. In this work we use WordNet version 1.5, a large scale taxonomic representation of concepts lexicalized in English. As a model of the lexicon, Word- Net's verb hierarchy is limited by design to paradigmatic relations, in explicit contrast to attempts to organize se- mantically coherent verb classes through shared syntac- tic behavior. The simplest and most traditional measure of semantic similarity in a taxonomy counts the number of edges in- be part-of-speech per se ; one could argue that some nouns carry similar kinds of participant information, observing, for example, that x's gift of y to z parallels x gave y to z . We are not attempting to address that issue here.