Language acquisition in a unification-based grammar processing system using a real-world knowledge base

One of the obstacles to be overcome in Natural Language Understanding is the existence of lexical gaps; that is, words or word senses which are not in the lexicon of the system. No lexicon, whether hand-coded or derived from an on-line dictionary, can ever be complete, in the sense of having entries for every word encountered in every syntactic category and with every semantic sense with which it may be used. In order to address this issue, this thesis describes the implementation of MURRAY, a learning mechanism which is able to (i) infer the syntactic properties of a new lexical item from its syntactic environment; (ii) infer the meaning of a novel lexical item based on context and a domain-specific database of real-world knowledge; and (iii) combine those syntactic and semantic properties of a given unknown word inferred from multiple pieces of input, resulting in a version space of possible lexical entries for the unknown, each consistent with all environments in which it has been encountered. MURRAY is an extension to an existing unification-based grammar processing system, U scNICORN. It has been implemented to operate with grammars written in the style of Head-Driven Phrase Structure Grammar, though it is compatible with any unification-based grammar formalism. On each encounter with a word which does not exist in the lexicon of the system, MURRAY constructs a lexical version hyperspace, a disjunction of version spaces, one for each possible syntactic category that the unknown could be in the given linguistic context. On each encounter with the unknown, information from the version spaces thus constructed is combined, so that from multiple inputs, the system converges on the target definition of the new word.