Tools for Lexicographers Revising an On-Line Thesaurus

Revision and extension of published dictionaries and thesauri form an essential component of the work of lexicographers. There are inherent difficulties associated with these tasks due to the large volume of data involved: consistency is hard to maintain, and checking or testing can become extremely tedious. These difficulties may be substantially aided by computer programs which manipulate the data, sort them in various ways, and present different relevant portions of the text to the lex­ icographer, to make decisions and instantiate changes. This paper discusses the automatic data manipulation that we perform as part of our lexical work at the IBM Watson Research Center and the ways in which it is relevant to lexicogra­ phers. Our research interest is in equipping the computer with lexical knowledge. The authors' recent efforts (Chodorow et al. 1988) have concentrated on equipping the system with some knowledge of synonyms derived from the machine-readable ver­ sion of THE NEW COLLINS THESAURUS (henceforth CT) . 1 Unlike humans, comput­ ers cannot rely on their "common sense", so information that is implied or assumed in CT had to be made explicit. For example, headwords had to be supplied with their parts of speech, and synonyms had to be disambiguated. Because of the size of the source, these tasks had to be performed automatically. In our computational manipulation of the CT material, we discovered some interesting properties of the interconnections found in the thesaurus: many of the links between synonyms are asymmetric and many are intransitive. These proper­ ties of asymmetry and intransitivity are common to most thesauri but their extent differs according to the size of the book and the judgements made by its lexicogra­ phers. Thus, the individual character of a particular thesaurus and its lexical content can be captured by a description of the patterns of asymmetry and intrans­ itivity found in it. Moreover, asymmetry and intransitivity are the product of human judgement, in situations often involving conflicting criteria. Consequently, inconsistency may very likely exist in the finished product. In the process of lexicographic revision, listings of asymmetry and intransitivity would seem useful. In the first section of this paper, we describe asymmetry and intransitivity as they appear in CT and discuss the concept of synonymy they express in the book. In the second section of the paper we describe how we have automatically disam­ biguated the synonyms found in CT. We had to perform sense disambiguation in order to be able to refer to particular senses of words, because synonymy links exist between senses ofwords, not between words themselves. In the rest of the paper, we discuss asymmetry (and, briefly, intransitivity) and suggest ways in which it can be captured and corrected.