DYFRAM: dynamic fragmentation and replica management in distributed database systems

In distributed database systems, tables are frequently fragmented and replicated over a number of sites in order to reduce network communication costs. How to fragment, when to replicate and how to allocate the fragments to the sites are challenging problems that has previously been solved either by static fragmentation, replication and allocation, or based on a priori query analysis. Many emerging applications of distributed database systems generate very dynamic workloads with frequent changes in access patterns from different sites. In such contexts, continuous refragmentation and reallocation can significantly improve performance. In this paper we present DYFRAM, a decentralized approach for dynamic table fragmentation and allocation in distributed database systems based on observation of the access patterns of sites to tables. The approach performs fragmentation, replication, and reallocation based on recent access history, aiming at maximizing the number of local accesses compared to accesses from remote sites. We show through simulations and experiments on the DASCOSA distributed database system that the approach significantly reduces communication costs for typical access patterns, thus demonstrating the feasibility of our approach.

[1]  B Praveen Kumar,et al.  Mariposa a Wide-Area Distributed Database System , 2010, ICCA 2010.

[2]  Tom W. Keller,et al.  Data placement in Bubba , 1988, SIGMOD '88.

[3]  John Hale,et al.  A genetic algorithm for fragment allocation in a distributed database system , 1994, SAC '94.

[4]  Martin L. Kersten,et al.  Adaptive Segmentation for Scientific Databases , 2008, 2008 IEEE 24th International Conference on Data Engineering.

[5]  Surajit Chaudhuri,et al.  An Online Approach to Physical Design Tuning , 2007, 2007 IEEE 23rd International Conference on Data Engineering.

[6]  Randy H. Katz,et al.  Distributing a database for parallelism , 1983, SIGMOD '83.

[7]  Raghu Ramakrishnan,et al.  Dynamic Histograms: Capturing Evolving Data Sets , 2000, Proceedings of 16th International Conference on Data Engineering (Cat. No.00CB37073).

[8]  Kien A. Hua,et al.  An Adaptive Data Placement Scheme for Parallel Database Computer Systems , 1990, VLDB.

[9]  Sam Lightstone,et al.  DB2 Design Advisor: Integrated Automatic Physical Database Design , 2004, VLDB.

[10]  Tolga Ulus,et al.  Heuristic Approach to Dynamic Data Allocation in Distributed Database Systems , 2003 .

[11]  Sushil Jajodia,et al.  Distributed algorithms for dynamic replication of data , 1992, PODS.

[12]  Anirban Mondal,et al.  EcoBroker: An Economic Incentive-Based Brokerage Model for Efficiently Handling Multiple-Item Queries to Improve Data Availability via Replication in Mobile-P2P Networks , 2010, DNIS.

[13]  Pedro Furtado,et al.  Experimental evidence on partitioning in parallel data warehouses , 2004, DOLAP '04.

[14]  César A. Galindo-Legaria,et al.  Database De-Centralization - A Practical Approach , 1995, VLDB.

[15]  A. M. Tamhankar,et al.  Database fragmentation and allocation: an integrated methodology and case study , 1998, IEEE Trans. Syst. Man Cybern. Part A.

[16]  Ishfaq Ahmad,et al.  Evolutionary Algorithms for Allocating Data in Distributed Database Systems , 2004, Distributed and Parallel Databases.

[17]  Michael Stonebraker,et al.  Data replication in Mariposa , 1996, Proceedings of the Twelfth International Conference on Data Engineering.

[18]  Gerhard Weikum,et al.  The COMFORT Automatic Tuning Project, Invited Project Review , 1994, Inf. Syst..

[19]  Domenico Saccà,et al.  Database partitioning in a cluster of processors , 1983, TODS.

[20]  Kjetil Nørvåg,et al.  Efficient and Robust Database Support for Data-Intensive Applications in Dynamic Environments , 2009, 2009 IEEE 25th International Conference on Data Engineering.

[21]  Rahul Simha,et al.  Experimental evaluation of dynamic data allocation strategies in a distributed database with changing workloads , 1995, CIKM '95.

[22]  Peter M G Apers,et al.  Data allocation in distributed database systems , 1988, TODS.

[23]  Yannis E. Ioannidis,et al.  The History of Histograms (abridged) , 2003, VLDB.

[24]  Chun Zhang,et al.  Automating physical database design in a parallel database , 2002, SIGMOD '02.

[25]  Karl Aberer,et al.  A self-organized, fault-tolerant and scalable replication scheme for cloud storage , 2010, SoCC '10.

[26]  Syam Menon,et al.  Allocating fragments in distributed databases , 2005, IEEE Transactions on Parallel and Distributed Systems.

[27]  Le Gruenwald,et al.  A survey of data replication techniques for mobile ad hoc network databases , 2008, The VLDB Journal.

[28]  Dong-Guk Shin,et al.  Fragmenting Relations Horizontally Using a Knowledge-Based Approach , 1991, IEEE Trans. Software Eng..

[29]  Philip S. Yu,et al.  Analysis of Replication in Distributed Database Systems , 1990, IEEE Trans. Knowl. Data Eng..

[30]  Vivek R. Narasayya,et al.  Automatic physical design tuning: workload as a sequence , 2006, SIGMOD Conference.

[31]  Vivek R. Narasayya,et al.  Integrating vertical and horizontal partitioning into automated physical database design , 2004, SIGMOD '04.

[32]  Olivia R. Liu Sheng Dynamic file migration in distributed computer systems , 1990, CACM.