Techniques for online exploration of large object-relational datasets

Reviews techniques for exploring large object-relational scientific and statistical databases in an interactive online manner. The idea is to provide continuously updated running estimates of the final query results to the user, along with an indication of the precision of the estimates. The user can then halt the query as soon as the answer is sufficiently precise-the time required to obtain an acceptable approximate answer can be faster by orders of magnitude than the time needed to completely process the query. We describe methods for online processing of aggregation queries, online visualization and online display of a set of result records. We also describe methods that use precomputed results to rapidly obtain approximate query answers.

[1]  Bruce G. Lindsay,et al.  Random sampling techniques for space efficient online computation of order statistics of large datasets , 1999, SIGMOD '99.

[2]  Yossi Matias,et al.  New sampling-based summary statistics for improving approximate query answers , 1998, SIGMOD '98.

[3]  Peter J. Haas,et al.  Ripple joins for online aggregation , 1999, SIGMOD '99.

[4]  Sridhar Ramaswamy,et al.  Join synopses for approximate query answering , 1999, SIGMOD '99.

[5]  Torsten Suel,et al.  Optimal Histograms with Quality Guarantees , 1998, VLDB.

[6]  Helen J. Wang,et al.  Online aggregation , 1997, SIGMOD '97.

[7]  Michael Stonebraker,et al.  Tioga-2: a direct manipulation database visualization environment , 1996, Proceedings of the Twelfth International Conference on Data Engineering.

[8]  Jeffrey F. Naughton,et al.  Selectivity and Cost Estimation for Joins Based on Random Sampling , 1996, J. Comput. Syst. Sci..

[9]  Christian Hidber,et al.  Association Rule Mining , 2017 .

[10]  Jeffrey Scott Vitter,et al.  Approximate computation of multidimensional aggregates of sparse data using wavelets , 1999, SIGMOD '99.

[11]  Goetz Graefe,et al.  Query evaluation techniques for large databases , 1993, CSUR.

[12]  F. Olken,et al.  Maintenance of materialized views of sampling queries , 1992, [1992] Eighth International Conference on Data Engineering.

[13]  Peter J. Haas,et al.  Improved histograms for selectivity estimation of range predicates , 1996, SIGMOD '96.

[14]  W. Hoeffding Probability Inequalities for sums of Bounded Random Variables , 1963 .

[15]  Jeffrey F. Naughton,et al.  Array-based evaluation of multi-dimensional queries in object-relational database systems , 1998, Proceedings 14th International Conference on Data Engineering.

[16]  Gregory Piatetsky-Shapiro,et al.  The KDD process for extracting useful knowledge from volumes of data , 1996, CACM.

[17]  Joseph M. Hellerstein,et al.  Online Dynamic Reordering for Interactive Data Processing , 1999, VLDB.

[18]  Robin Jeffries,et al.  Orienteering in an information landscape: how information seekers get from here to there , 1993, INTERCHI.

[19]  Theodore Johnson,et al.  Range selectivity estimation for continuous attributes , 1999, Proceedings. Eleventh International Conference on Scientific and Statistical Database Management.

[20]  P.J. Haas,et al.  Sampling-based selectivity estimation for joins using augmented frequent value statistics , 1995, Proceedings of the Eleventh International Conference on Data Engineering.

[21]  Jehoshua Bruck,et al.  Partial-Sum Queries in OLAP Data Cubes Using Covering Codes , 1998, IEEE Trans. Computers.

[22]  Peter J. Haas,et al.  Large-sample and deterministic confidence intervals for online aggregation , 1997, Proceedings. Ninth International Conference on Scientific and Statistical Database Management (Cat. No.97TB100150).

[23]  Bruce G. Lindsay,et al.  Approximate medians and other quantiles in one pass and with limited memory , 1998, SIGMOD '98.