Data Declustering with Replications

Declustering is used to distribute blocks of data among multiple devices, thus enabling parallel I/O access and reducing query response times. Many data declustering schemes have been proposed in the literature. However, these schemes are designed for non-replication systems, and thus they will fail if any disk fails. Assume that a single disk would fail once every five years, a non-replication system with 100 disks would have failed every 18 days. Data replication is a technique commonly used in multidisk systems to enhance availability of data during disk failures and, often as a second goal, to improve I/O performance of read-intensive applications. In this paper, we propose a LOG data declustering scheme for systems with replication. Furthermore, we present a novel replication algorithm. Although the replication algorithm is designed for the LOG declustering scheme, it is also applicable to existing schemes such as DM, GFIB, and GRS. Finally, as demonstrated by our experimental results, the LOG scheme with the proposed replication algorithm provides a significant performance improvement compared to the state-of-the-art data declustering schemes.

[1]  Khaled A. S. Abdel-Ghaffar,et al.  Cyclic allocation of two-dimensional data , 1998, Proceedings 14th International Conference on Data Engineering.

[2]  Sunita Sarawagi,et al.  Modeling multidimensional databases , 1997, Proceedings 13th International Conference on Data Engineering.

[3]  Christine T. Cheng,et al.  From discrepancy to declustering: near-optimal multidimensional declustering strategies for range queries , 2002, PODS '02.

[4]  Surajit Chaudhuri,et al.  An overview of data warehousing and OLAP technology , 1997, SGMD.

[5]  Hung-Chang Du Disk allocation methods for binary Cartesian product files , 1986, BIT Comput. Sci. Sect..

[6]  Khaled A. S. Abdel-Ghaffar,et al.  Efficient retrieval of multidimensional datasets through parallel I/O , 1998, Proceedings. Fifth International Conference on High Performance Computing (Cat. No. 98EX238).

[7]  Sam Yuan Sung Performance Analysis of Disk Modulo Allocation Method for Cartesian Product Files , 1987, IEEE Transactions on Software Engineering.

[8]  Chin-Chen Chang,et al.  Optimal Bucket Allocation Design of k-ary MKH Files for Partial Match Retrieval , 1997, IEEE Trans. Knowl. Data Eng..

[9]  Hamid Pirahesh,et al.  Data Cube: A Relational Aggregation Operator Generalizing Group-By, Cross-Tab, and Sub-Totals , 1996, Data Mining and Knowledge Discovery.

[10]  Randeep Bhatia,et al.  Asymptotically optimal declustering schemes for 2-dim range queries , 2003, Theor. Comput. Sci..

[11]  Randeep Bhatia,et al.  Asymptotically Optimal Declustering Schemes for Range Queries , 2001, ICDT.

[12]  John S. Sobolewski,et al.  Disk allocation for Cartesian product files on multiple-disk systems , 1982, TODS.

[13]  Sakti Pramanik,et al.  Optimal file distribution for partial match retrieval , 1988, SIGMOD '88.

[14]  Randeep Bhatia,et al.  Declustering using golden ratio sequences , 2000, Proceedings of 16th International Conference on Data Engineering (Cat. No.00CB37073).

[15]  Christos Faloutsos,et al.  Disk Allocation Methods Using Error Correcting Codes , 1991, IEEE Trans. Computers.