Predicting the impact of expansion terms using semantic and user interaction features

Query expansion for Information Retrieval is a challenging task. On the one hand, low quality expansion may hurt either recall, due to vocabulary mismatch, or precision, due to topic drift, and therefore reduce user satisfaction. On the other hand, utilizing a large number of expansion terms for a query may easily lead to resource consumption overhead. As web search engines apply strict constraints on response time, it is essential to estimate the impact of each expansion term on query performance at the pre-retrieval time. Our experimental results confirm that a significant part of expansions do not improve query performance, and it is possible to detect such expansions at the pre-retrieval time.