Wavelet transformation-based management of integrated summary data for distributed query processing

Abstract As the Internet technology evolves, there is growing need for Internet queries involving multiple information sources. Efficient processing of such queries necessitates the integrated summary data that compactly represents the data distribution of the entire database scattered over many information sources. We propose a new method based on wavelet transform that creates and maintains the integrated summary data by merging multiple instances of summary data, each of which is maintained in an information source. A wavelet-based summary data is easily converted to satisfy conditions for merging. Moreover, the merging process is very simple owing to the shifting and linearity properties of wavelet transform. We formally derive the upper bound of the absolute, square-root, and maximum errors in the integrated wavelet-based summary data. We also show that the integrated summary data can be used for optimizing Internet queries effectively.

[1]  Eugene Wong,et al.  Query processing in a system for distributed databases (SDD-1) , 1981, TODS.

[2]  Jeffrey Scott Vitter,et al.  Dynamic Maintenance of Wavelet-Based Histograms , 2000, VLDB.

[3]  Jeffrey D. Ullman,et al.  MedMaker: a mediation system based on declarative specifications , 1996, Proceedings of the Twelfth International Conference on Data Engineering.

[4]  Michael Stonebraker,et al.  The Asilomar report on database research , 1998, SGMD.

[5]  Luis Gravano,et al.  Evaluating Top-k Selection Queries , 1999, VLDB.

[6]  Kyu-Young Whang,et al.  A linear-time probabilistic counting algorithm for database applications , 1990, TODS.

[7]  Patricia G. Selinger,et al.  Access path selection in a relational database management system , 1979, SIGMOD '79.

[8]  Peter J. Haas,et al.  Improved histograms for selectivity estimation of range predicates , 1996, SIGMOD '96.

[9]  David Salesin,et al.  Wavelets for computer graphics: theory and applications , 1996 .

[10]  Yossi Matias,et al.  New sampling-based summary statistics for improving approximate query answers , 1998, SIGMOD '98.

[11]  Jelena Kovacevic,et al.  Wavelets and Subband Coding , 2013, Prentice Hall Signal Processing Series.

[12]  Torsten Suel,et al.  Optimal Histograms with Quality Guarantees , 1998, VLDB.

[13]  Jennifer Widom,et al.  Research problems in data warehousing , 1995, CIKM '95.

[14]  Surajit Chaudhuri,et al.  An overview of data warehousing and OLAP technology , 1997, SGMD.

[15]  Naphtali Rishe,et al.  An instant and accurate size estimation method for joins and selections in a retrieval-intensive environment , 1993, SIGMOD '93.

[16]  Jeffrey Scott Vitter,et al.  Wavelet-based histograms for selectivity estimation , 1998, SIGMOD '98.

[17]  Michael J. Carey,et al.  On saying “Enough already!” in SQL , 1997, SIGMOD '97.

[18]  Yannis Papakonstantinou,et al.  Object Fusion in Mediator Systems , 1996, VLDB.

[19]  Gio Wiederhold,et al.  Separability —An Approach to Physical Database Design , 1984, IEEE Transactions on Computers.

[20]  Patrick Valduriez,et al.  Principles of Distributed Database Systems , 1990 .

[21]  Peter J. Haas,et al.  The New Jersey Data Reduction Report , 1997 .

[22]  Gregory Piatetsky-Shapiro,et al.  Accurate estimation of the number of tuples satisfying a condition , 1984, SIGMOD '84.

[23]  Jane W.-S. Liu,et al.  APPROXIMATE - A Query Processor that Produces Monotonically Improving Approximate Answers , 1993, IEEE Trans. Knowl. Data Eng..