CONTEXT-BASED DIVERSIFICATION FOR KEYWORD QUERIES OVER XML DATA

While keyword query empowers ordinary users to sear ch vast amount of data, the ambiguity of keyword qu ery makes it difficult to effectively answer keyword qu eries, especially for short and vague keyword queri es. To address this challenging problem, in this paper we propose an ap proach that automatically diversifies XML keyword s earch based on its different contexts in the XML data. Given a short a nd vague keyword query and XML data to be searched, w firstly derive keyword search candidates of the query by a simple feature selection model. And then, we design an eff ective XML keyword search diversification model to measure the quality of each candidate. After that, two efficie nt algorithms are proposed to incrementally compute top-k qualified q uery candidates as the diversified search intention s. Two selection criteria are targeted: the k selected query candida tes re most relevant to the given query while they ave to cover maximal number of distinct results. At last, a comprehensiv e e aluation on real and synthetic datasets demonst rate the effectiveness of our proposed diversification model and the effic ien y of our algorithms