Toward the Essential Nature of Statistical Knowledge in Sense Resolution

The statistical basis for sense resolution decisions is arrived at by the application of a process to a corpus of instances. In general, once the process has been applied to the corpus, the system contains both some residual representation of the instances and some explicit augmentation of that representation with information that was implicit in the corpus. For example, part of the residual representation of He feels happy on Fridays might be the (word sense) pair (happy feel-as-emotion), and part of the augmentation might be the probability of happy co-occurring with the sense of feel as an emotion. We show that for the simple residual representation of (word sense) pairs, the existence of such a representation in and of itself captures much of the regularity inherent in the data. We also demonstrate that augmenting the residual representation with the actual number of times each pair occurs in the training corpus provides most of the remainder of the power of probabalistic approaches. Finally, we show how viewing this residual representation as a form of episodic memory can enable symbolic, knowledge-rich systems to take advantage of this source of regularity in performing sense resolution.