Accurate Semantic Annotations via Pattern Matching

This paper addresses the problem of performing accurate semantic annotations in a large corpus. The task of creating a sense tagged corpus is different from the word sense disambiguation problem in that the semantic annotations have to be highly accurate, even if the price to be paid is lower coverage. While the state-of-the-art in word sense disambiguation does not exceed 70% precision, we want to find the means to perform semantic annotations with an accuracy close to 100%. We deal with this problem in the process of disambiguating the definitions in the WordNet dictionary. We propose in this paper a method that is able to tag words with high precision, using pattern extraction followed by pattern matching. This algorithm exploits the idiosyncratic nature of the corpus to be tagged, and achieves a precision of 99% with a coverage of 6%, measured on aW ordNet subset, respectively more than 12.5% coverage estimated for the entire WordNet.