Aggregation computation over complex objects

The aggregation query is an important but costly operation in database management systems. While the aggregation in relational databases has been well studied, recently there has been a growing interest in improving the performance of computing aggregates over complex objects. Each of such objects may have a time interval, a spatial location/region, or both, as appears in temporal, spatial, and spatio-temporal databases. An aggregation query over these objects typically involves some selection condition on their time and/or spatial attributes, e.g. to aggregate over temporal records whose time intervals intersect a given time interval. A straightforward approach is to, with the aid of some index structures, locate the objects that satisfy the selection condition and aggregate their values on the fly. Such indices are general in the sense that they can be utilized not only to compute aggregates, but also to perform selection queries. However, the aggregation query performance is proportional to the number of objects satisfying the selection condition. In the worst case, to compute an aggregate, all objects in a database needs to be examined. Nevertheless, in many applications (on-line analysis, etc.), we need to compute the aggregates very fast, and to scan through all the existing objects might be too time-consuming. In this thesis, we focus on devising specialized indices for aggregation over complex objects. Our research shows that our newly designed structures have much better query performance than the existing general-index-based solutions, sometimes over a hundred times faster. In this thesis, we report our findings.

[1]  Richard T. Snodgrass,et al.  Computing temporal aggregates , 1995, Proceedings of the Eleventh International Conference on Data Engineering.

[2]  Dimitrios Gunopulos,et al.  Temporal Aggregation over Data Streams Using Multiple Granularities , 2002, EDBT.

[3]  Xinfeng Ye,et al.  Processing temporal aggregates in parallel , 1997, 1997 IEEE International Conference on Systems, Man, and Cybernetics. Computational Cybernetics and Simulation.

[4]  Arie Segev,et al.  A consensus glossary of temporal database concepts , 1994, SIGMOD 1994.

[5]  Sharad Mehrotra,et al.  Progressive approximate aggregate queries with a multi-resolution tree structure , 2001, SIGMOD '01.

[6]  Richard T. Snodgrass,et al.  Parallel algorithms for computing temporal aggregates , 1999, Proceedings 15th International Conference on Data Engineering (Cat. No.99CB36337).

[7]  Christos Faloutsos,et al.  The R+-Tree: A Dynamic Index for Multi-Dimensional Objects , 1987, VLDB.

[8]  Vassilis J. Tsotras,et al.  Comparison of access methods for time-evolving data , 1999, CSUR.

[9]  Jon Louis Bentley,et al.  Decomposable Searching Problems I: Static-to-Dynamic Transformation , 1980, J. Algorithms.

[10]  Jon Louis Bentley,et al.  Multidimensional divide-and-conquer , 1980, CACM.

[11]  Hector Garcia-Molina,et al.  Expiring Data in a Warehouse , 1998, VLDB.

[12]  Claudio Bettini Semantic Compression of Temporal Data , 2001, WAIM.

[13]  Dimitrios Gunopulos,et al.  Efficient aggregation over objects with extent , 2002, PODS '02.

[14]  Hans-Joachim Lenz,et al.  The R/sub a/*-tree: an improved R*-tree with materialized data for supporting range queries on OLAP-data , 1998, Proceedings Ninth International Workshop on Database and Expert Systems Applications (Cat. No.98EX130).

[15]  Bongki Moon,et al.  Scalable algorithms for large temporal aggregation , 2000, Proceedings of 16th International Conference on Data Engineering (Cat. No.00CB37073).

[16]  Mark H. Overmars,et al.  The Design of Dynamic Data Structures , 1987, Lecture Notes in Computer Science.

[17]  Abraham Silberschatz,et al.  View maintenance issues for the chronicle data model (extended abstract) , 1995, PODS.

[18]  Jeffrey F. Naughton,et al.  An array-based algorithm for simultaneous multidimensional aggregates , 1997, SIGMOD '97.

[19]  Donghui Zhang,et al.  Improving min/max aggregation over spatial objects , 2001, GIS '01.

[20]  Hans-Joachim Lenz,et al.  PISA: Performance models for Index Structures with and without Aggregated data , 1999, Proceedings. Eleventh International Conference on Scientific and Statistical Database Management.

[21]  Jeffrey Scott Vitter,et al.  Approximate computation of multidimensional aggregates of sparse data using wavelets , 1999, SIGMOD '99.

[22]  Christos Faloutsos,et al.  Designing Access Methods for Bitemporal Databases , 1998, IEEE Trans. Knowl. Data Eng..

[23]  Curtis E. Dyreson,et al.  A Glossary of Time Granularity Concepts , 1997, Temporal Databases, Dagstuhl.

[24]  Michael Ian Shamos,et al.  Computational geometry: an introduction , 1985 .

[25]  David B. Lomet,et al.  Access methods for multiversion data , 1989, SIGMOD '89.

[26]  Jeffrey Scott Vitter,et al.  External memory algorithms and data structures: dealing with massive data , 2001, CSUR.

[27]  Bernhard Seeger,et al.  An asymptotically optimal multiversion B-tree , 1996, The VLDB Journal.

[28]  Divyakant Agrawal,et al.  pCube: Update-efficient online aggregation with progressive feedback and error bounds , 2000, Proceedings. 12th International Conference on Scientific and Statistica Database Management.

[29]  Jirí Matousek,et al.  Geometric range searching , 1994, CSUR.

[30]  Hans-Peter Kriegel,et al.  The R*-tree: an efficient and robust access method for points and rectangles , 1990, SIGMOD '90.

[31]  Divyakant Agrawal,et al.  Selectivity Estimation for Spatial Joins with Geometric Selections , 2002, EDBT.

[32]  Paul M. Aoki How to avoid building DataBlades(R) that know the value of everything and the cost of nothing , 1999, Proceedings. Eleventh International Conference on Scientific and Statistical Database Management.

[33]  Antonin Guttman,et al.  R-trees: a dynamic index structure for spatial searching , 1984, SIGMOD '84.

[34]  Kurt Mehlhorn,et al.  Data Structures and Algorithms 3: Multi-dimensional Searching and Computational Geometry , 2012, EATCS Monographs on Theoretical Computer Science.

[35]  Jennifer Widom,et al.  Incremental computation and maintenance of temporal aggregates , 2001, Proceedings 17th International Conference on Data Engineering.

[36]  Hamid Pirahesh,et al.  Data Cube: A Relational Aggregation Operator Generalizing Group-By, Cross-Tab, and Sub-Totals , 1996, Data Mining and Knowledge Discovery.

[37]  Geoff Hulten,et al.  Mining high-speed data streams , 2000, KDD '00.

[38]  Bernard Chazelle,et al.  A Functional Approach to Data Structures and Its Use in Multidimensional Searching , 1988, SIAM J. Comput..

[39]  Jeffrey Scott Vitter,et al.  Data cube approximation and histograms via wavelets , 1998, CIKM '98.

[40]  Ju-Hong Lee,et al.  Dynamic Update Cube for Range-sum Queries , 2001, VLDB.