What can partitioning do for your data warehouses and data marts?

Efficient query processing is a critical requirement for data warehousing systems as decision support applications often require minimum response times to answer complex, ad-hoc queries having aggregations, multi-ways joins over vast repositories of data. This can be achieved by fragmenting warehouse data. The data fragmentation concept in the context of distributed databases aims to reduce query execution time and facilitates the parallel execution of queries. In this paper, we propose a methodology for applying the fragmentation technique in a data warehouse star schema to reduce the total query execution cost. We present an algorithm for fragmenting the tables of a star schema. During the fragmentation process, we observe that the choice of the dimension tables used in fragmenting the fact table plays an important role on overall performance. Therefore, we develop a greedy algorithm in selecting "best" dimension tables. We propose an analytical cost model for executing a set of OLAP queries on a fragmented star schema. Finally, we conduct some experiments to evaluate the utility of the fragmentation for efficiently executing OLAP queries.

[1]  Joseph M. Firestone Data Warehouses and Data Marts : A Dynamic View , 2002 .

[2]  Mukesh K. Mohania,et al.  OLAP query processing for partitioned data warehouses , 1999, Proceedings 1999 International Symposium on Database Applications in Non-Traditional Environments (DANTE'99) (Cat. No.PR00496).

[3]  Kenneth A. Ross,et al.  Faster Joins, Self-Joins and Multi-Way Joins Using Join Indices , 1997, Data Knowl. Eng..

[4]  Patrick E. O'Neil,et al.  Improved query performance with variant indexes , 1997, SIGMOD '97.

[5]  Ken Barker,et al.  A horizontal fragmentation algorithm for the fact relation in a distributed data warehouse , 1999, CIKM '99.

[6]  Krithi Ramamritham,et al.  Curio: A Novel Solution for Efficient Storage and Indexing in Data Warehouses , 1999, VLDB.

[7]  Bongki Moon,et al.  A case for parallelism in data warehousing and OLAP , 1998, Proceedings Ninth International Workshop on Database and Expert Systems Applications (Cat. No.98EX130).

[8]  Surajit Chaudhuri,et al.  Index merging , 1999, Proceedings 15th International Conference on Data Engineering (Cat. No.99CB36337).

[9]  Hui Lei,et al.  Faster Joins, Self-Joins and Multi-Way Joins Using Join Indices , 1999, Data Knowl. Eng..

[10]  Patrick Valduriez,et al.  Principles of Distributed Database Systems , 1990 .

[11]  Surajit Chaudhuri,et al.  Maintenance of Materialized Views: Problems, Techniques, and Applications. , 1995 .

[12]  Stefano Ceri,et al.  Horizontal data partitioning in database design , 1982, SIGMOD '82.

[13]  Jeffrey D. Ullman,et al.  Index selection for OLAP , 1997, Proceedings 13th International Conference on Data Engineering.

[14]  Raghu Ramakrishnan,et al.  Database Management Systems , 1976 .