Partial Materialized Views

Early access to partial query results is highly desirable during exploration of massive data sets. However, it is challenging to provide transactionally consistent, immediate partial results without significantly increasing queries' execution time. To address this problem, this paper proposes a partial materialized view (PMV) method to cache some of the most frequently accessed results rather than all the possible results. Compared to traditional materialized views, the proposed PMVs do not require maintenance during insertion into base relations, and have much smaller storage and maintenance overhead. Upon the arrival of a query, the RDBMS first searches the PMV and returns to the user the cached partial results. Since a large portion of the PMV is cached in memory, this usually finishes within a millisecond. Then the RDBMS continues to execute the query to find the remaining results. The efficiency of our PMV method is evaluated through a simulation study, a theoretical analysis, and an initial implementation in PostgreSQL.

[1]  Yossi Matias,et al.  New sampling-based summary statistics for improving approximate query answers , 1998, SIGMOD '98.

[2]  Sam Lightstone,et al.  DB2 Design Advisor: Integrated Automatic Physical Database Design , 2004, VLDB.

[3]  Jonathan Goldstein,et al.  Optimizing queries using materialized views: a practical, scalable solution , 2001, SIGMOD '01.

[4]  Jeffrey F. Naughton,et al.  A comparison of three methods for join view maintenance in parallel RDBMS , 2003, Proceedings 19th International Conference on Data Engineering (Cat. No.03CH37405).

[5]  Praveen Seshadri,et al.  Generalized partial indexes , 1995, Proceedings of the Eleventh International Conference on Data Engineering.

[6]  Amihai Motro,et al.  VAGUE: a user interface to relational databases that permits vague queries , 1988, TOIS.

[7]  Amihai Motro,et al.  TupleRank: Ranking Discovered Content in Virtual Databases , 2006, NGITS.

[8]  Walid G. Aref,et al.  Supporting top-kjoin queries in relational databases , 2004, The VLDB Journal.

[9]  Dennis Shasha,et al.  2Q: A Low Overhead High Performance Buffer Management Replacement Algorithm , 1994, VLDB.

[10]  Gang Zhou,et al.  A framework for supporting data integration using the materialized and virtual approaches , 1996, SIGMOD '96.

[11]  Ron Kohavi,et al.  Supervised and Unsupervised Discretization of Continuous Features , 1995, ICML.

[12]  Aristides Gionis,et al.  Automated Ranking of Database Query Results , 2003, CIDR.

[13]  Luis Gravano,et al.  Top-k selection queries over relational databases: Mapping strategies and performance evaluation , 2002, TODS.

[14]  Peter J. Haas,et al.  Ripple joins for online aggregation , 1999, SIGMOD '99.

[15]  Norbert Fuhr,et al.  A Probabilistic Framework for Vague Queries and Imprecise Information in Databases , 1990, VLDB.

[16]  Frederick Reiss,et al.  TelegraphCQ: Continuous Dataflow Processing for an Uncertain World , 2003, CIDR.

[17]  Seung-won Hwang,et al.  Automatic categorization of query results , 2004, SIGMOD '04.

[18]  Luis Gravano,et al.  STHoles: a multidimensional workload-aware histogram , 2001, SIGMOD '01.

[19]  Alon Y. Halevy,et al.  An adaptive query execution system for data integration , 1999, SIGMOD '99.

[20]  Thorsten Joachims,et al.  Optimizing search engines using clickthrough data , 2002, KDD.

[21]  Ashish Gupta,et al.  Materialized views: techniques, implementations, and applications , 1999 .

[22]  S WeldDaniel,et al.  An adaptive query execution system for data integration , 1999 .

[23]  Kyuseok Shim,et al.  Optimizing queries with materialized views , 1995, Proceedings of the Eleventh International Conference on Data Engineering.

[24]  Mong-Li Lee,et al.  ICICLES: Self-Tuning Samples for Approximate Query Answering , 2000, VLDB.

[25]  Joseph M. Hellerstein,et al.  Partial results for online query processing , 2002, SIGMOD '02.

[26]  C. R. Henson Conclusion , 1969 .

[27]  Jeffrey F. Naughton,et al.  Caching multidimensional queries using chunks , 1998, SIGMOD '98.

[28]  Gerhard Weikum,et al.  Probabilistic Ranking of Database Query Results , 2004, VLDB.

[29]  Abraham Silberschatz,et al.  Operating System Concepts, Sixth Edition , 2002 .

[30]  Jennifer Widom,et al.  Adaptive caching for continuous queries , 2005, 21st International Conference on Data Engineering (ICDE'05).

[31]  Alon Y. Halevy,et al.  Answering queries using views: A survey , 2001, The VLDB Journal.

[32]  Vagelis Hristidis,et al.  DISCOVER: Keyword Search in Relational Databases , 2002, VLDB.

[33]  William W. Cohen Integration of heterogeneous databases without common domains using queries based on textual similarity , 1998, SIGMOD '98.

[34]  GravanoLuis,et al.  Top-k selection queries over relational databases , 2002 .

[35]  Michael J. Carey,et al.  On saying “Enough already!” in SQL , 1997, SIGMOD '97.

[36]  Benoît Dageville,et al.  Automatic SQL Tuning in Oracle 10g , 2004, VLDB.

[37]  Michael Stonebraker,et al.  The case for partial indexes , 1989, SGMD.

[38]  Surajit Chaudhuri,et al.  DBXplorer: a system for keyword-based search over relational databases , 2002, Proceedings 18th International Conference on Data Engineering.

[39]  Raghu Ramakrishnan,et al.  Probabilistic Optimization of Top N Queries , 1999, VLDB.

[40]  Helen J. Wang,et al.  Online aggregation , 1997, SIGMOD '97.

[41]  Abraham Silberschatz,et al.  Operating System Concepts , 1983 .

[42]  F. Olken,et al.  Maintenance of materialized views of sampling queries , 1992, [1992] Eighth International Conference on Data Engineering.

[43]  Surajit Chaudhuri,et al.  Self-tuning histograms: building histograms without looking at data , 1999, SIGMOD '99.

[44]  Surajit Chaudhuri,et al.  Automated Selection of Materialized Views and Indexes in SQL Databases , 2000, VLDB.

[45]  Alon Y. Halevy,et al.  MiniCon: A scalable algorithm for answering queries using views , 2000, The VLDB Journal.

[46]  Rajeev Motwani,et al.  On random sampling over joins , 1999, SIGMOD '99.

[47]  Jonathan Goldstein,et al.  Relaxed currency and consistency: how to say "good enough" in SQL , 2004, SIGMOD '04.

[48]  Hamid Pirahesh,et al.  DBCache: middle-tier database caching for highly scalable e-business architectures , 2003, SIGMOD '03.