Introducing Supplemental Context for Word Sense Disambiguation

Microtext is sparse and informal content typical in social media that is being widely used to study various facets of today's society. This paper proposes the use of supplemental context to counteract the limitations imposed by the sparsity and the informality of microtext on the performance of word sense disambiguation (WSD). WSD relies on the senses of words around an ambiguous word to disambiguate it. Because microtext is sparse and informal, it lacks exploitable context. This creates a major challenge for using this kind of data and consequently to the analyses of studies that rely on microtext sources. This paper proposes, demonstrates, and describes some of the challenges in selecting and utilizing supplemental context. We present studies using twitter data. We validate our studies with around 10,000 tweets using a gold standard proxy we call the blue standard. The method relies on the notion of one sense per collocation, and we implement it by identifying collocated word sequences that are strongly indicative of the target word's sense.