The use of phrases from query texts in information retrieval (poster session)

1. In t roduct ion In this paper we examine the utility of phrases as search terms in information retrieval OR). We focus on linguistically motivated phrases as extracted from query texts by natural language processing. We use shallow syntactic processing instead of statistical processing to automatically identify candidate phrasal terms from query texts, on the assumption that syntactic phrases are expected to he more meaningful than statistically obtained word pairs, and thus more powerful for discriminating among documents. While StrT-qlkowski et el. [6] and Mitra et el . [4] have found phrases to be useful indexing units, we only use phrases for query construction, to avoid the computationally expensive process of identifying phrases for every document in a collection~ Phrases are represented using a proximity operator and issued to our IR system. To yield a "value-added" phrase representation, we investigated the effects of proximity variations on the retrieval effectiveness. We also tried to find a scheme to adjust phrasal term weights because finding an adequate term weighting scheme is critical in term-based statistical information retrieval By experimenting with weighting variations imposed on phra~! terms extracted from TREC-7 topics, we adopted a scheme to maximize the retrieval performance in our IR system. The rest of the paper is organized as follows. Section 2 presents our query constructing techniques, especially those which deal with phrasal terms. Section 3 descrihes the results of performance evaluation using TREC-8 topics. Section 4 concludes the paper.