CORADD: Correlation Aware Database Designer for Materialized Views and Indexes

We describe an automatic database design tool that exploits correlations between attributes when recommending materialized views (MVs) and indexes. Although there is a substantial body of related work exploring how to select an appropriate set of MVs and indexes for a given workload, none of this work has explored the effect of correlated attributes (e.g., attributes encoding related geographic information) on designs. Our tool identifies a set of MVs and secondary indexes such that correlations between the clustered attributes of the MVs and the secondary indexes are enhanced, which can dramatically improve query performance. It uses a form of Integer Linear Programming (ILP) called ILP Feedback to pick the best set of MVs and indexes for given database size constraints. We compare our tool with a state-of-the-art commercial database designer on two workloads, APB-1 and SSB (Star Schema Benchmark---similar to TPC-H). Our results show that a correlation-aware database designer can improve query performance up to 6 times within the same space budget when compared to a commercial database designer.

[1]  Sergei Vassilvitskii,et al.  k-means++: the advantages of careful seeding , 2007, SODA '07.

[2]  Chuan Yi Tang,et al.  A 2.|E|-Bit Distributed Algorithm for the Directed Euler Trail Problem , 1993, Inf. Process. Lett..

[3]  Jacques Desrosiers,et al.  Selected Topics in Column Generation , 2002, Oper. Res..

[4]  Paul Brown,et al.  BHUNT: Automatic Discovery of Fuzzy Algebraic Constraints in Relational Data , 2003, VLDB.

[5]  Anastasia Ailamaki,et al.  An Integer Linear Programming Approach to Database Design , 2007, 2007 IEEE 23rd International Conference on Data Engineering Workshop.

[6]  Surajit Chaudhuri,et al.  An Efficient Cost-Driven Index Selection Tool for Microsoft SQL Server , 1997, VLDB.

[7]  Timos K. Sellis,et al.  Processing Star Queries on Hierarchically-Clustered Fact Tables , 2002, VLDB.

[8]  Phillip B. Gibbons Distinct Sampling for Highly-Accurate Answers to Distinct Values Queries and Event Reports , 2001, VLDB.

[9]  Surajit Chaudhuri,et al.  Automated Selection of Materialized Views and Indexes in SQL Databases , 2000, VLDB.

[10]  Surajit Chaudhuri,et al.  Index merging , 1999, Proceedings 15th International Conference on Data Engineering (Cat. No.99CB36337).

[11]  Xuedong Chen,et al.  Adjoined Dimension Column Clustering to Improve Data Warehouse Query Performance , 2008, 2008 IEEE 24th International Conference on Data Engineering.

[12]  S. P. Lloyd,et al.  Least squares quantization in PCM , 1982, IEEE Trans. Inf. Theory.

[13]  Rajeev Motwani,et al.  Towards estimation error guarantees for distinct values , 2000, PODS.

[14]  Stanley B. Zdonik,et al.  Correlation Maps: A Compressed Access Method for Exploiting Soft Functional Dependencies , 2009, Proc. VLDB Endow..

[15]  Paul Brown,et al.  CORDS: automatic discovery of correlations and soft functional dependencies , 2004, SIGMOD '04.