OSQR: overlapping clustering of query results

Query Result Clustering is the operation of grouping the rows in a relational query result into meaningful clusters. This clustering gives the user a higher-level view of the data and can be used for easier navigation of the result [3]. A unique feature of query result clustering is the additional availability of the underlying database from which the query result was picked up. Generic categorical data clustering algorithms (see Berkhin [2] for a recent survey) spend substantial time and effort mining information from the given data in order to obtain better cluster quality. Since these algorithms assume an isolated setup, they cannot take advantage of this underlying database in order to obtain more meaningful clusters. In this paper, we propose OSQR, a novel approach for query result clustering that effectively uses additional information from the underlying database to generate meaningful overlapping clusters. OSQR also associates with each cluster a set of terms that characterize the cluster with respect to the entire underlying database. Moreover, OSQR automatically determines the appropriate number of clusters for the query result being considered.