Do Rule-Based Approaches Still Make Sense in Logical Data Warehouse Design?

As any product design, data warehouse applications follow a well-known life-cycle. Historically, it included only the physical phase, and had been gradually extended to include the conceptual and the logical phases. The management of phases either internally or intranally is dominated by rule-based approaches. More recently, a cost-based approach has been proposed to substitute rule-based approaches in the physical design phase in order to optimize queries. Unlike the traditional rule-based approach, it explores a huge search space of solutions (e.g., query execution plans), and then based on a cost-model, it selects the most suitable one(s). On the other hand, the logical design phase is still managed by rule-based approaches applied on the conceptual schema. In this paper, we propose to propagate the cost-based vision on the logical phase. As a consequence, the selection of a logical design of a given data warehouse schema becomes an optimization problem with a huge space search generated thanks to correlations (e.g. hierarchies) between data warehouse concepts. By the means of a cost model estimating the overall query processing cost, the best logical schema is selected. Finally, a case study using the Star Schema Benchmark is presented to show the effectiveness of our proposal.

[1]  Matteo Golfarelli,et al.  Data Warehouse Testing , 2011, Int. J. Data Warehous. Min..

[2]  Dennis Tsichritzis,et al.  The ANSI/X3/SPARC DBMS Framework Report of the Study Group on Dabatase Management Systems , 1978, Inf. Syst..

[3]  Jean-Marc Petit,et al.  Some remarks on self-tuning logical database design , 2005, 21st International Conference on Data Engineering Workshops (ICDEW'05).

[4]  William J. Rasdorf,et al.  A structure-based model of semantic integrity constraints for relational data bases , 2005, Engineering with Computers.

[5]  Erhard Rahm,et al.  Multi-Dimensional Database Allocation for Parallel Data Warehouses , 2000, VLDB.

[6]  Bernd Neumayr,et al.  Using Domain Ontologies as Semantic Dimensions in Data Warehouses , 2012, ER.

[7]  Surajit Chaudhuri,et al.  Automated Selection of Materialized Views and Indexes in SQL Databases , 2000, VLDB.

[8]  Ramakrishnan Srikant,et al.  Fast Algorithms for Mining Association Rules in Large Databases , 1994, VLDB.

[9]  Wenfei Fan,et al.  Conditional Functional Dependencies for Data Cleaning , 2007, 2007 IEEE 23rd International Conference on Data Engineering.

[10]  Paul Brown,et al.  BHUNT: Automatic Discovery of Fuzzy Algebraic Constraints in Relational Data , 2003, VLDB.

[11]  Jean-François Boulicaut,et al.  Towards the reverse engineering of renormalized relational databases , 1996, Proceedings of the Twelfth International Conference on Data Engineering.

[12]  Stanley B. Zdonik,et al.  CORADD: Correlation Aware Database Designer for Materialized Views and Indexes , 2010, Proc. VLDB Endow..

[13]  E. F. Codd,et al.  A relational model of data for large shared data banks , 1970, CACM.

[14]  Tim Martyn,et al.  Reconsidering Multi-Dimensional schemas , 2004, SGMD.

[15]  Ramakrishnan Srikant,et al.  Fast algorithms for mining association rules , 1998, VLDB 1998.

[16]  Pascal Richard,et al.  Referential Horizontal Partitioning Selection Problem in Data Warehouses: Hardness Study and Selection Algorithms , 2009, Int. J. Data Warehous. Min..

[17]  Johannes Gehrke,et al.  Rule-based multi-query optimization , 2009, EDBT '09.

[18]  Sudha Ram,et al.  A comprehensive framework for modeling set-based business rules during conceptual database design , 2005, Inf. Syst..

[19]  Ralf Hartmut Güting,et al.  Rule-based optimization and query processing in an extensible geometric database system , 1992, TODS.

[20]  Holger Herbst,et al.  Business rule oriented conceptual modeling , 2000 .