TREC-4 Experiments at Dublin City University: Thresholding Posting Lists, Query Expansion with WordNet and POS Tagging of Spanish
暂无分享,去创建一个
In this paper we describe work done as part of the TREC-4 benchmarking exercice by a team from Dublin City University. In TREC-4 we had 3 activities as follows : In work on improving the efficiency of standard SMART like query processing we have applied various thresholding processes to the postings list of an inverted file and we have limited the number of document score accumulators available during query processing. The first run we submitted for evaluation in TREC-4 used our best set of thresholding and acumulator set parameters ; The second run we submitted is based upon a query expansion using terms from WordNet. Essentially, for each original query term we determine its level of specificity or abstraction ; for broad terms we add more specific terms, for specific original terms we add broader ones ; for ones in-between we add both broader and narrower terms. When the query is expanded we then delete all the original query terms in order to add to the judged pool, documents that our expansion would find that would nat have been found by other retrieval. This run DCU952 ; The third run we submitted was for Spanish data. We ran the entire document corpus through a POS tagger and indexed documents (and query) by a combination of base form on non stopwords plus their POS class. Retrieval is performed using SMART with extra weights for query and document terms depending on their POS class.