Web Query Refinement without Information Loss

Query Refinement is an effective information retrieval technique that interactively provides users with new keywords related to a particular query. Chen, et al. ([7]) proposed a concept called coverage to solve the problem that hundreds of thousands of keywords are presented as candidates due to their presence in the relevant documents. By this concept, all keywords are divided into two parts, that is, prime keywords and non-prime keywords. Refinement candidates are chosen only from the prime keywords which compose a very small subset of all the keywords. In this paper, we proposed an algorithm of representing non-prime keywords effectively and efficiently, which is remained as an open problem in the previous work. A Web-based prototype system is implemented to show the feasibility of the refinement system. Except for the refinement, our system behaves exactly as most of today’s keyword-based search engines on the Internet. Experiments we conducted with a kinds of datasets confirm the effectiveness and efficiency in candidate reduction.