论文信息 - Towards Computational Guessing of Unknown Word Meanings: The Ontological Semantic Approach

Towards Computational Guessing of Unknown Word Meanings: The Ontological Semantic Approach

Towards Computational Guessing of Unknown Word Meanings: The Ontological Semantic Approach Julia M. Taylor (jtaylor1@purdue.edu) CERIAS, Purdue University & RiverGlass, Inc West Lafayette, IN 47907 & Champaign, IL 61820 Victor Raskin (vraskin@purdue.edu) Linguistics & CERIAS, Purdue University West Lafayette, IN 47907 Christian F. Hempelmann (chempelm@purdue.edu) Linguistics, Purdue University & RiverGlass, Inc West Lafayette, IN 47907 & Champaign, IL 61820 Abstract The paper describes a computational approach for guessing the meanings of previously unaccounted words in an implemented system for natural language processing. Interested in comparing the results to what is known about human guessing, it reviews a largely educational approach, partially based on cognitive psychology, to teaching humans, mostly children, to acquire new vocabulary from contextual clues, as well as the lexicographic efforts to account for neologisms. It then goes over the previous NLP efforts in processing new words and establishes the difference—mostly, much richer semantic resources—of the proposed approach. Finally, the results of a computer experiment that guesses the meaning of a non-existent word, placed as the direct object of 100 randomly selected verbs, from the known meanings of these verbs, with methods of the ontological semantics technology, are presented and discussed. While the results are promising percentage-wise, ways to improve them within the approach are briefly outlined. Keywords: guessing word meaning, natural language understanding, ontological semantic technology Unknown Words in Text Along with ambiguity, unattested input is one of the major problems for natural language processing systems. An NLP system is robust only if it can deal with unknown words. Yet, to deal with such words only makes sense when the rest of the sentence is understood. We take an approach here similar to that of a human learner that encounters an unfamiliar word and is able to approximate its meaning based on the rest of the sentence or its subsequent usages in other sentences. There are some suggested strategies in the human acquisition and understanding of unknown words. Some cases stand out as easy and almost self-explanatory. One of these cases is when a word is immediately explained. Such an explanation may be introduced by a that is phrase (To lose weight, one may have to follow a diet, that is, to limit the amount of food and to avoid eating certain foods.), or by apposition (Computers programs follow algorithms, ordered lists of instructions to perform.), or by examples (The earliest records of felines, for example, cats, tigers, lions, or leopards, are from millions of years ago.), or by providing the presumably known opposites for comparison through words like but, rather then, not (It is frigid outside, rather than warm and comfortable like yesterday.). Both in the case of human acquisition of new vocabulary and the machine attempt at guessing its meaning, these somewhat trivial instances, where the meaning of a new word is immediately explained, either by giving its definition or by examples, present no particular interest for us here. Besides, such cases are rather rare in regular expository texts because most writers do not bother to allow for vocabulary deficiency with regards to words with which they are well familiar themselves. Thus, it is the non-trivial cases, those without an attached explanation or description, that it is necessary to address when one is interested in designing a computer system for natural language understanding. On the other side of the spectrum lie words that can only be guessed through their functional description, not necessarily following the first use of an unknown word. These functional descriptions should be gathered throughout the document, or a number of documents, narrowing the original functional description, if necessary, or supplying other facets of it. For example, They used a Tim-Tim to navigate their way to the cabin on the lake. It took them almost half a day. They hadn’t checked if the maps had been recently updated on the device, and spent hours looking for roads that no longer existed. From the clues in the first sentence, Tim-Tim can be understood as a navigation instrument (including an atlas or a map) through an inverse function of the instrument of navigation. Since no other devices are mentioned, this navigation instrument can be considered the device from the third sentence whose maps can be periodically updated. It is essential, therefore, in situations of dispersed clues that co-reference (or antecedence) be established correctly—in this case, between device and Tim-Tim. Towards the middle of the spectrum are the cases where the description may immediately follow the first use of the word but without being helpfully triggered by phrases like for example or that is (He was rather taciturn. He didn’t like