论文信息 - Parallel Probing of Web Databases for Top-k Query Processing

Parallel Probing of Web Databases for Top-k Query Processing

A “top-k query” specifies a set of preferredvalues for the attributes of a relation and expects as a result thek objects that are “closest” to the given preferences according to some distance function. In many web applications, the relation attributes are only available viaprobesto autonomous webaccessible sources. Probing these sources sequentially to process a topk query is inefficient, since web accesses exhibit high and variable latency. Fortunately, web sources can be probed in parallel, and each source can typically process concurrent requests, although sources may impose some restrictions on the type and number of probes that they are willing to accept. These characteristics of web sources motivate the introduction of parallel top-k query processing strategies, which are the focus of this paper. We present efficient techniques that maximize source-access parallelism to minimize query response time, while satisfying source access constraints. A thorough experimental evaluation over both synthetic and real web sources shows that our techniques can be significantly more efficient than previously proposed sequential strategies. In addition, we adapt our parallel algorithms for the alternate optimization goal of minimizing source load while still exploiting source-access parallelism.

Luis Gravano | Amélie Marian | L. Gravano | A. Marian

[1] Roy Goldman,et al. WSQ/DSQ: a practical approach for combined querying of databases and the Web , 2000, SIGMOD '00.

[2] Joseph M. Hellerstein,et al. Eddies: continuously adaptive query processing , 2000, SIGMOD '00.

[3] Roy Goldman,et al. WSQ/DSQ: a practical approach for combined querying of databases and the Web , 2000, SIGMOD 2000.

[4] Luis Gravano,et al. Evaluating top-k queries over web-accessible databases , 2004, TODS.

[5] Seung-won Hwang,et al. Minimal probing: supporting expensive predicates for top-k queries , 2002, SIGMOD '02.

[6] F. A. Seiler,et al. Numerical Recipes in C: The Art of Scientific Computing , 1989 .

[7] Ronald Fagin,et al. Combining Fuzzy Information from Multiple Systems , 1999, J. Comput. Syst. Sci..

[8] Surya Nepal,et al. Query processing issues in image (multimedia) databases , 1999, Proceedings 15th International Conference on Data Engineering (Cat. No.99CB36337).

[9] Chad Carson,et al. Optimizing queries over multimedia repositories , 1996, SIGMOD '96.

[10] Thomas S. Huang,et al. Supporting Ranked Boolean Similarity Queries in MARS , 1998, IEEE Trans. Knowl. Data Eng..

[11] Moni Naor,et al. Optimal aggregation algorithms for middleware , 2001, PODS '01.

[12] Luis Gravano,et al. Optimizing queries over multimedia repositories , 1996, SIGMOD 1996.