The Structure of the Merriam-Webster Pocket Dictionary

The structure of a dictionary, in this case the Merriam-Webster Pocket Dictionary (1970 edition), has been revealed through a series of computational operations augmented with human interpretation. The following dissertation will attempt to describe the details of this investigation, its procedures, and conclusions with regard to lexical structure. The structural description of the dictionary as a natural language artifact requires both an appreciation of the lexicogragher's design for the dictionary and the less apparent underlying organization which the dictionary suggests for the English lexicon. The dictionary is neither a purely formal description of the language, nor a casual assemblage of textual definitions. It has structure both as an artifact of the lexicographer's art, and more importantly as a result of the lexicographer's shared semantic and pragmatic knowledge of the world used in writing definitions. Because definitions posses a somewhat formal syntax they are capable of human analysis. Because there are tens of thousands of definitions involved it becomes necessary to use computational techniques to augment this human analysis. This might serve as an informal definition of computational lexicology, i.e. the application of computational techniques to facilitate human analysis of the structure of the dictionary. The dictionary is shown to have a fundamentally taxonomic organization for nouns and verbs. Because the dictionary is a closed system, i.e. words used in definitions are themselves elsewhere defined in the dictionary, definitions naturally terminate in circular clusters. These clusters constitute the primitive concepts of the language and the exposition of their existence and members provides insights into Enlgish language semantics. The dictionary is an acyclic semi-lattice once allowance for these primitive terminal clusters is made. A maximal depth of less than 20 and widths of a few hundred senses are present. The dictionary is seen as a profitable subject for future exploration and potentially useful in a multitude of tasks in computational linguistics, artificial intelligence and cognitive science. Techniques for providing detailed descriptions of semantic domains of nouns and verbs are described together with case studies of the verbs of motion and the vehicle nouns. The potential use of dictionary data for automatic disambiguation is discussed.