Lexical ambiguity and Information Retrieval revisited

A number of previous experiments on the role of lexical ambiguity, in Information Retrieval are reproduced on the'IR-Semcor test collection (derived from Semcor), where both queries and documents are hand-tagged ;with phrases, Part-Of-Speech and WordNet 1.5 senses. Our results indicate that a) Word Sense Disambiguation can be more beneficial to Information Retrieval than the experiments of Sanderson (1994) with artificially ambiguous pseudo-words suggested, b) PartOf-Speech tagging does not seem to help Improving retrieval, even if it is manually annotated, c) Using phrases as indexing terms is not a good strategy if no partial credit is given to the phrase components. 1 I n t r o d u c t i o n A major difficulty to experiment with lexical ambiguity issues in Information Retrieval is always to differentiate the effects of the indexing and retrieval strategy being tested from the effects of tagging errors. Some examples are: 1. In (RichardSon and Smeaton, 1995), a sophisticated retrieval system based on conceptual similarity resultled in a decrease of IR performance. It was not possible, however, to distinguish the effects of the strategy and the effects of automatic Wordl Sense Disambiguation (WSD) errors. In (Smeaton and Quigley, 1996), a similar strategy and a combination of manual disambiguation and very short documents -image captionspioduced, however, an improvement of IR perforinance. 2. In (Krovetz, 1997), discriminating word senses with differefit Part-Of-Speech (as annotated by the Church :POS tagger) also harmed retrieval efficiency. Krovetz noted than more than half of the words in a dictionary that differ in POS are related i n meaning, but he could not decide whether the decrease of performance was due to the loss of such semantic relatedness or to automatic POS tagging errors. 3. In (Sanderson, 1994), the problem of discerning the effects of differentiating word senses from the effects of inaccurate disambiguation was overcome using artificially created pseudo-words (substituting, for instance, all occurrences of banana or kalashnikov for banana/kalashnikov) that could be disambiguated with 100% accuracy (substituting banana/kalashnikov back to the original term in each occurrence, either banana or kalashnikov). He found that IR processes were quite resistant to increasing degrees of lexical ambiguity, and that disambiguation harmed IR efficiency if performed with less that 90% accuracy. The question is whether real ambiguous words would behave as pseudo-words. 4. In (Schiitze and Pedersen, 1995) it was shown that sense discriminations extracted from the test collections may enhance text retrieval. However, the static sense inventories in dictionaries or thesauri -such as WordNethave not been used satisfactorily in IR. For instance, in (Voorhees, 1994), manual expansion of TREC queries with semantically related words from WordNet only produced slight improvements with the shortest queries. In order to deal with these problems, we designed an IR test collection which is hand annotated with Part-Of-Speech and semantic tags from WordNet 1.5. This collection was first introduced in (Gonzalo et al., 1998) and it is described in Section 2. This collection is quite small for current IR standards (it is only slightly bigger than the TIME collection), but offers a unique chance to analyze the behavior of semantic approaches to IR before scaling them up to TREC-size collections (where manual tagging is unfeasible). In (Gonzalo et al., 1998), we used the manual annotations in the IR-Semcor collection to show that indexing with WordNet synsets can give significant improvements to Text Retrieval, even for large queries. Such strategy works better than the synonymy expansion in (Voorhees, 1994), probably because it identifies synonym terms but, at the same