Corpus-Based Statistical Sense Resolution

The three corpus-based statistical sense resolution methods studied here attempt to infer the correct sense of a polysemous word by using knowledge about patterns of word cooccurrences. The techniques were based on Bayesian decision theory, neural, networks, and content vectors as used in information retrieval. To understand these methods better, we posed a very specific problem: given a set of contexts, each containing the noun line in a known sense, construct a classifier that selects the correct sense of line for new contexts. To see how the degree of polysemy affects performance, results from three- and six-sense tasks are compared.The results demonstrate that each of the techniques is able to distinguish six senses of line with an accuracy greater than 70%. Furthermore, the response patterns of the classifiers are, for the most part, statistically indistinguishable from one another. Comparison of the two tasks suggests that the degree of difficulty involved in resolving individual senses is a greater performance factor than the degree of polysemy.