Word sense disambiguation for free-text indexing using a massive semantic network

Semantics-free, word-based information retrieval is thwarted by two complementary problems. First, search for relevant documents returns irrelevant items when all meanings of a search term are used, rather than just the meaning intended. This causes low precision. Second, relevant items are missed when they are indexed not under the actual search terms, but rather under related terms. This causes low recall. With semantics-free approaches there is generally no way to improve both precision and recall at the same time. Word sense disambiguation during document indexing should improve precision. We have investigated using the massive Word Net semantic network for disambigu at ion during indexing. With the unconstrained text of the SMART ret rieval environment, we have had to derive our own content description from the input text, given only part-ofspeech tagging of the input. We employ the notion of semantic distance between network nodes. Input text terms with multiple senses are disambiguated by finding the combination of senses from a set of contiguous terms which minimizes total pairwise dist ante between senses. Results so far have been encouraging. Improvement in disamblguation compared with chance is clear