Ranking Objects by Exploiting Relationships: Computing Top-K over Aggregation

document collections, documents are related to objects such as document authors, products described in the document, or persons referred to in the document. In many applications, the goal is to find such related objects that best match a set of keywords. The keywords may not necessarily occur in the textual descriptions of target objects; they occur only in the documents. In order to answer these queries, we exploit the relationships between the documents containing the keywords and the target objects related to those documents. Current keyword query paradigms do not use these relationships effectively and hence are inefficient for these queries. In this paper, we consider a class of queries called the "object finder" queries. Our goal is to return the top K objects that best match a given set of keywords by exploiting the relationships between documents and objects. We design efficient algorithms by developing early termination strategies in presence of blocking operators such as group by. Our experiments with real datasets and workloads demonstrate the effectiveness of our techniques. Although we present our techniques in the context of keyword search, our techniques apply to other types of ranked searches (e.g., multimedia search) as well.

[1]  Surya Nepal,et al.  Query processing issues in image (multimedia) databases , 1999, Proceedings 15th International Conference on Data Engineering (Cat. No.99CB36337).

[2]  Nelson Mendonça Mattos,et al.  Integrating SQL Databases with Content-Specific Search Engines , 1997, VLDB.

[3]  Luis Gravano,et al.  Evaluating Top-k Selection Queries , 1999, VLDB.

[4]  Vagelis Hristidis,et al.  DISCOVER: Keyword Search in Relational Databases , 2002, VLDB.

[5]  Surajit Chaudhuri,et al.  DBXplorer: a system for keyword-based search over relational databases , 2002, Proceedings 18th International Conference on Data Engineering.

[6]  Jack G. Conrad,et al.  A system for discovering relationships by feature extraction from text databases , 1994, SIGIR '94.

[7]  Seung-won Hwang,et al.  Minimal probing: supporting expensive predicates for top-k queries , 2002, SIGMOD '02.

[8]  Moni Naor,et al.  Optimal aggregation algorithms for middleware , 2001, PODS '01.

[9]  Wei-Ying Ma,et al.  Object-level ranking: bringing order to Web objects , 2005, WWW '05.

[10]  S. Sudarshan,et al.  Keyword searching and browsing in databases using BANKS , 2002, Proceedings 18th International Conference on Data Engineering.

[11]  Ronald Fagin,et al.  Combining Fuzzy Information from Multiple Systems , 1999, J. Comput. Syst. Sci..

[12]  Gerhard Weikum,et al.  Integrating DB and IR Technologies: What is the Sound of One Hand Clapping? , 2005, CIDR.

[13]  Aristides Gionis,et al.  Automated Ranking of Database Query Results , 2003, CIDR.