STHoles: a multidimensional workload-aware histogram

Attributes of a relation are not typically independent. Multidimensional histograms can be an effective tool for accurate multiattribute query selectivity estimation. In this paper, we introduce STHoles, a “workload-aware” histogram that allows bucket nesting to capture data regions with reasonably uniform tuple density. STHoles histograms are built without examining the data sets, but rather by just analyzing query results. Buckets are allocated where needed the most as indicated by the workload, which leads to accurate query selectivity estimations. Our extensive experiments demonstrate that STHoles histograms consistently produce good selectivity estimates across synthetic and real-world data sets and across query workloads, and, in many cases, outperform the best multidimensional histogram techniques that require access to and processing of the full data sets during histogram construction.

[1]  J. T. Robinson,et al.  The K-D-B-tree: a search structure for large multidimensional dynamic indexes , 1981, SIGMOD '81.

[2]  Jürg Nievergelt,et al.  The Grid File: An Adaptable, Symmetric Multikey File Structure , 1984, TODS.

[3]  Gregory Piatetsky-Shapiro,et al.  Accurate estimation of the number of tuples satisfying a condition , 1984, SIGMOD '84.

[4]  Antonin Guttman,et al.  R-trees: a dynamic index structure for spatial searching , 1984, SIGMOD '84.

[5]  William H. Press,et al.  Numerical recipes in C. The art of scientific computing , 1987 .

[6]  David J. DeWitt,et al.  Equi-Depth Histograms For Estimating Selectivity Factors For Multi-Dimensional Queries , 1988, SIGMOD Conference.

[7]  D. DeWitt,et al.  Equi-depth multidimensional histograms , 1988, SIGMOD '88.

[8]  F. A. Seiler,et al.  Numerical Recipes in C: The Art of Scientific Computing , 1989 .

[9]  David B. Lomet,et al.  The hB-tree: a multiattribute indexing method with good guaranteed performance , 1990, TODS.

[10]  Doron Rotem,et al.  Random Sampling from Database Files: A Survey , 1990, SSDBM.

[11]  William H. Press,et al.  The Art of Scientific Computing Second Edition , 1998 .

[12]  Bernd-Uwe Pagel,et al.  Towards an analysis of range query performance in spatial data structures , 1993, PODS '93.

[13]  Nick Roussopoulos,et al.  Adaptive selectivity estimation using query feedback , 1994, SIGMOD '94.

[14]  Yannis E. Ioannidis,et al.  Balancing histogram optimality and practicality for query result size estimation , 1995, SIGMOD '95.

[15]  Yannis E. Ioannidis,et al.  Query optimization , 1996, CSUR.

[16]  Peter J. Haas,et al.  Improved histograms for selectivity estimation of range predicates , 1996, SIGMOD '96.

[17]  Yannis E. Ioannidis,et al.  Selectivity Estimation Without the Attribute Value Independence Assumption , 1997, VLDB.

[18]  Torsten Suel,et al.  Optimal Histograms with Quality Guarantees , 1998, VLDB.

[19]  Joseph M. Hellerstein,et al.  CONTROL: continuous output and navigation technology with refinement on-line , 1998, SIGMOD '98.

[20]  Jeffrey Scott Vitter,et al.  Wavelet-based histograms for selectivity estimation , 1998, SIGMOD '98.

[21]  Surajit Chaudhuri,et al.  An overview of query optimization in relational systems , 1998, PODS.

[22]  Oliver Günther,et al.  Multidimensional access methods , 1998, CSUR.

[23]  Catherine Blake,et al.  UCI Repository of machine learning databases , 1998 .

[24]  Yannis E. Ioannidis,et al.  Histogram-Based Approximation of Set-Valued Query-Answers , 1999, VLDB.

[25]  Deok-Hwan Kim,et al.  Multi-dimensional selectivity estimation using compressed histogram information , 1999, SIGMOD '99.

[26]  Viswanath Poosala,et al.  Fast approximate answers to aggregate queries on a data cube , 1999, Proceedings. Eleventh International Conference on Scientific and Statistical Database Management.

[27]  Torsten Suel,et al.  On Rectangular Partitionings in Two Dimensions: Algorithms, Complexity, and Applications , 1999, ICDT.

[28]  Surajit Chaudhuri,et al.  Self-tuning histograms: building histograms without looking at data , 1999, SIGMOD '99.

[29]  Raghu Ramakrishnan,et al.  Dynamic Histograms: Capturing Evolving Data Sets , 2000, Proceedings of 16th International Conference on Data Engineering (Cat. No.00CB37073).

[30]  Dimitrios Gunopulos,et al.  Approximating multi-dimensional aggregate range queries over real attributes , 2000, SIGMOD '00.

[31]  STHoles: A Multidimensional Workload-Aware Histogram , 2001, SIGMOD Conference.

[32]  Yossi Matias,et al.  Fast incremental maintenance of approximate histograms , 1997, TODS.