Data mining based fragmentation technique for distributed data warehouses environment Using predicate construction technique

Distributed Data Warehouses (DDWs) afford several advantages over traditional environments. Such architecture improves system performance by allowing data to be spread across data marts. Subsequently, queries can be run over smaller data sets and therefore their execution time reduces. To design an effective distributed model, it is important to manage an appropriate methodology for data fragmentation and fragment allocation. Nevertheless, very little works address this problem in a distributed context. This paper is focuses on DDW. It proposes a data mining-based horizontal fragmentation methodology for a relational DDW environment. This methodology combines the known predicate construction technique with a clustering method to fragment Data Warehouse (DW) relations. Fragments are then allocated to the corresponding site according to their frequency of use. We show experimentally with the use of the APB-1 release II benchmark that DW decentralization gives better performance. Global queries execution time is fewer by 80%.

[1]  Ladjel Bellatreche,et al.  An Evolutionary Approach to Schema Partitioning Selection in a Data Warehouse , 2005, DaWaK.

[2]  J. MacQueen Some methods for classification and analysis of multivariate observations , 1967 .

[3]  Alejandro P. Buchmann,et al.  Research Issues in Data Warehousing , 1997, BTW.

[4]  Henrique Madeira,et al.  Handling big dimensions in distributed data warehouses using the DWS technique , 2004, DOLAP '04.

[5]  Matteo Golfarelli,et al.  Vertical Fragmentation of Views in Relational Data Warehouses , 1999, SEBD.

[6]  Rogério Luís de Carvalho Costa,et al.  Data Warehouses in Grids with High QoS , 2006, DaWaK.

[7]  Krithi Ramamritham,et al.  Curio: A Novel Solution for Efficient Storage and Indexing in Data Warehouses , 1999, VLDB.

[8]  Marian Popescu,et al.  ABOUT DATA FRAGMENTATION AND ALLOCATION IN DISTRIBUTED OBJECT ORIENTED DATABASES , 2007 .

[9]  Cristina Dutra de Aguiar Ciferri,et al.  Focusing on Data Distribution in the WebD2W System , 2002, DaWaK.

[10]  Pedro Furtado Workload-Based Placement and Join Processing in Node-Partitioned Data Warehouses , 2004, DaWaK.

[11]  Anne Tchounikine,et al.  A model for distributing and querying a data warehouse on a computing grid , 2005, 11th International Conference on Parallel and Distributed Systems (ICPADS'05).

[12]  Ken Barker,et al.  A horizontal fragmentation algorithm for the fact relation in a distributed data warehouse , 1999, CIKM '99.

[13]  Patrick Valduriez,et al.  Principles of Distributed Database Systems , 1990 .

[14]  Cristina Dutra de Aguiar Ciferri,et al.  Horizontal fragmentation as a technique to improve the performance of drill-down and roll-up queries , 2007, SAC '07.