Referential Horizontal Partitioning Selection Problem in Data Warehouses: Hardness Study and Selection Algorithms

Horizontal Partitioning has been largely adopted by the database community, where it took a significant part in the physical design process. Actually, it is supported by most commercial database systems (DBMS), where a native Data Definition Language for decomposing tables/materialized views using various modes is proposed. In traditional databases, horizontal partitioning has been largely studied, where several fragmentation algorithms were proposed to partition tables in isolation. In the relational data warehouse environment, horizontal partitioning consists in decomposing the whole warehouse schema into sub schemas, where each schema contains fragments of dimension and fact tables. Dimension tables are fragmented using the primary partitioning mode, whereas the fact table is divided using referential mode. In this article, the authors first focus on the evolution of horizontal partitioning in commercial DBMS motivated by decision support applications. Secondly, they give a formalization of the referential fragmentation schema selection problem in the data warehouse and they study its hardness to select an optimal solution. Due to its high complexity, they develop two algorithms: hill climbing and simulated annealing with several variants to select a near optimal partitioning schema. Finally, extensive experimental studies are conducted using the data set of APB1 benchmark to compare the quality the proposed algorithms using a mathematical cost model. Based on these experiments, some recommendations are given to advise database administrator for well using horizontal partitioning.

[1]  Ana Simonet,et al.  Algorithms and Support for Horizontal Class Partitioning in Object-Oriented Databases , 2004, Distributed and Parallel Databases.

[2]  Dimitrios Gunopulos,et al.  Answering top-k queries using views , 2006, VLDB.

[3]  Stefano Ceri,et al.  Horizontal data partitioning in database design , 1982, SIGMOD '82.

[4]  Li Fan,et al.  Web caching and Zipf-like distributions: evidence and implications , 1999, IEEE INFOCOM '99. Conference on Computer Communications. Proceedings. Eighteenth Annual Joint Conference of the IEEE Computer and Communications Societies. The Future is Now (Cat. No.99CH36320).

[5]  David S. Johnson,et al.  Computers and Intractability: A Guide to the Theory of NP-Completeness , 1978 .

[6]  David Taniar,et al.  Object-oriented Oracle , 2005 .

[7]  Shamkant B. Navathe,et al.  Vertical partitioning for database design: a graphical algorithm , 1989, SIGMOD '89.

[8]  Domenico Saccà,et al.  Database partitioning in a cluster of processors , 1983, TODS.

[9]  Eugene Inseok Chong,et al.  Supporting table partitioning by reference in oracle , 2008, SIGMOD Conference.

[10]  Habiba Drias,et al.  A Data Mining Approach for selecting Bitmap Join Indices , 2007, J. Comput. Sci. Eng..

[11]  Shamkant B. Navathe,et al.  Optimal Redesign Policies to Support Dynamic Processing of Applications on a Distributed Relational Database System , 1996, Inf. Syst..

[12]  Yannis E. Ioannidis,et al.  Randomized algorithms for optimizing large join queries , 1990, SIGMOD '90.

[13]  M. C. Er,et al.  A Fast Algorithm for Generating Set Partitions , 1988, Comput. J..

[14]  David Taniar,et al.  Exploring Advances in Interdisciplinary Data Mining and Analytics: New Trends , 2011 .

[15]  C. D. Gelatt,et al.  Optimization by Simulated Annealing , 1983, Science.

[16]  Vivek R. Narasayya,et al.  Integrating vertical and horizontal partitioning into automated physical database design , 2004, SIGMOD '04.

[17]  Erhard Rahm,et al.  Multi-Dimensional Database Allocation for Parallel Data Warehouses , 2000, VLDB.

[18]  Pascal Richard,et al.  Data Partitioning in Data Warehouses: Hardness Study, Heuristics and ORACLE Validation , 2008, DaWaK.

[19]  John H. Holland,et al.  Adaptation in Natural and Artificial Systems: An Introductory Analysis with Applications to Biology, Control, and Artificial Intelligence , 1992 .

[20]  Anastasia Ailamaki,et al.  AutoPart: automating schema design for large scientific databases using data partitioning , 2004, Proceedings. 16th International Conference on Scientific and Statistical Database Management, 2004..

[21]  Surajit Chaudhuri,et al.  Automated Selection of Materialized Views and Indexes in SQL Databases , 2000, VLDB.

[22]  Kenneth A. Ross,et al.  Fast joins using join indices , 1999, The VLDB Journal.

[23]  Eric Pardede,et al.  Object-Related Approaches , 2006 .