Performance evaluation of grid based multi-attribute record declustering methods

We focus on multi-attribute declustering methods which are based on some type of grid-based partitioning of the data space. Theoretical results are derived which show that no declustering method can be strictly optimal for range queries if the number of disks is greater than 5. A detailed performance evaluation is carried out to see how various declustering schemes perform under a wide range of query and database scenarios (both relative to each other and to the optimal). Parameters that are varied include shape and size of queries, database size, number of attributes and the number of disks. The results show that information about common queries on a relation is very important and ought to be used in deciding the declustering for it, and that this is especially crucial for small queries. Also, there is no clear winner, and as such parallel database systems must support a number of declustering methods.<<ETX>>

[1]  John S. Sobolewski,et al.  Disk allocation for Cartesian product files on multiple-disk systems , 1982, TODS.

[2]  Sakti Pramanik,et al.  Optimal file distribution for partial match retrieval , 1988, SIGMOD '88.

[3]  Jürg Nievergelt,et al.  The Grid File: An Adaptable, Symmetric Multikey File Structure , 1984, TODS.

[4]  Garth A. Gibson Tutorial: Performance and reliability in redundant disk arrays , 1993 .

[5]  Randy H. Katz,et al.  A case for redundant arrays of inexpensive disks (RAID) , 1988, SIGMOD '88.

[6]  R. Supnik Digital's Alpha project , 1993, CACM.

[7]  David J. DeWitt,et al.  A multiuser performance analysis of alternative declustering strategies , 1990, [1990] Proceedings. Sixth International Conference on Data Engineering.

[8]  David Kotz,et al.  Multiprocessor file system interfaces , 1993, [1993] Proceedings of the Second International Conference on Parallel and Distributed Information Systems.

[9]  Sam Yuan Sung Performance Analysis of Disk Modulo Allocation Method for Cartesian Product Files , 1987, IEEE Transactions on Software Engineering.

[10]  Jianzhong Li,et al.  CMD : A Multidimensional Declustering Method for Parallel Database Systems 1 , 1992 .

[11]  Fazlollah M. Reza,et al.  Introduction to Information Theory , 2004, Lecture Notes in Electrical Engineering.

[12]  Hung-Chang Du Disk allocation methods for binary Cartesian product files , 1986, BIT Comput. Sci. Sect..

[13]  Christos Faloutsos,et al.  Disk Allocation Methods Using Error Correcting Codes , 1991, IEEE Trans. Computers.

[14]  H. V. Jagadish,et al.  Linear clustering of objects with multiple attributes , 1990, SIGMOD '90.

[15]  Tadao Kasami,et al.  Performance analysis of disk allocation method using error-correcting codes , 1991, IEEE Trans. Inf. Theory.

[16]  Christos Faloutsos,et al.  Declustering using fractals , 1993, [1993] Proceedings of the Second International Conference on Parallel and Distributed Information Systems.

[17]  David J. DeWitt,et al.  Hybrid-Range Partitioning Strategy: A New Declustering Strategy for Multiprocessor Database Machines , 1990, VLDB.

[18]  Lilian Harada,et al.  Query processing method for multi-attribute clustered relations , 1990, VLDB 1990.

[19]  David A. Patterson Massive Parallelism and Massive Storage: Trends and Predictions for 1995 to 2000 , 1993, PDIS.

[20]  David J. DeWitt,et al.  A performance analysis of alternative multi-attribute declustering strategies , 1992, SIGMOD '92.

[21]  Jaideep Srivastava,et al.  CMD: A Multidimensional Declustering Method for Parallel Data Systems , 1992, VLDB.

[22]  David J. DeWitt,et al.  Parallel database systems: the future of high performance database systems , 1992, CACM.

[23]  El AbbadiAmr,et al.  Optimal disk allocation for partial match queries , 1993 .