Handling verbose queries for spoken document retrieval

Query-by-example information retrieval provides users a flexible but efficient way to accurately describe their information needs. The query exemplars are usually long and in the form of either a partial or even a full document. However, they may contain extraneous terms that would have potential negative impacts on the retrieval performance. In order to alleviate those negative impacts, we propose a novel term-based query reduction mechanism so as to improve the informativeness of verbose query exemplars. We also explore the notion of term discrimination power to select a salient subset of query terms automatically. Experiments on the TDT Chinese collection show that the proposed approach is indeed effective and promising.