NLP Track at TREC-5

Abstract : The Natural Language Processing (NLP) track was organized for the first time at TREC-5 to provide a more focused look at how NLP techniques can improve performance in information retrieval (IR). The intent was to see if the NLP techniques available today are mature enough to have an impact on IR, and whether they offer an advantage over purely quantitative methods. TREC-5 also was the place to try more expensive and more risky solutions than those used in main TREC evaluations. This NLP track demonstrated that NLP techniques have a solid but limited impact on the quality of text retrieval, particularly precision. Techniques aimed at producing higher quality queries, (e.g., query expansion, constraints) appear to be more effective than those aimed primarily at obtaining improved indexing of database documents. More work is needed before substantial gains can be seen, including the use of more advanced and expensive semantic analysis techniques. Figure 2 summarizes the NLP techniques that have been tried in information retrieval, and what their potential might be for improving retrieval precision. This chart was discussed at the NLP track workshop on the last day of TREC-5. The consensus was that NLP techniques that show particular promise in relatively small-scale track evaluations should be transferred to main evaluations as soon as practical.