Multi-Site Declustering Strategies for Very High Database Service Availability

The thesis introduces the concept of multi-site declustering strategies with self repair for databases demanding very high service availability. Existing work on declustering strategies are centered around providing high performance and reliability inside a small geographical area (site). Applications demanding robustness against site failures like fire and power outages, can not use these methods. Such applications will often both replicate information inside one site and then replicate the site on another site and thus resulting in unnecessary high redundancy cost. Multi-site declustering provides robustness against site failures with only two replicas of data without compromising the performance and reliability. Self repair is proposed for reducing the probability of double-failures causing data loss and reducing the need for rapid replacement of failed hardware. A prerequisite for multi-site declustering with self repair is fast, long-distance, communication networks like ATM. The thesis shows how existing declustering strategies like Mirrored, Interleaved, Chained, and HypRa declustering can be used as multi-site declustering strategies. In addition a new strategy called Q-rot declustering is proposed. Compared with the others it gives larger flexibility with respect to repair strategy, number of sites, and usage pattern. To evaluate availability of systems using the methods a general evaluation model has been developed. Multi-site Chained declustering provides the best availability of the methods evaluated. Q-rot declustering has comparable availability but is significantly more flexible. The evaluation model provides insight and can be used to understand the declustering problem better and to develop new and improved multi-site declustering strategies. The model can also be used as a configuration tool by organizations wanting to deploy one of the declustering strategies.

[1]  Won Kim Highly available systems for database applications , 1984, CSUR.

[2]  이헌,et al.  [기술동향 소개]Fault Tolerant Computing System , 1985 .

[3]  Hector Garcia-Molina,et al.  Disk striping , 1986, 1986 IEEE Second International Conference on Data Engineering.

[4]  David J. DeWitt,et al.  Chained declustering: a new availability strategy for multiprocessor database machines , 1990, [1990] Proceedings. Sixth International Conference on Data Engineering.

[5]  John F. Meyer,et al.  On Evaluating the Performability of Degradable Computing Systems , 1980, IEEE Transactions on Computers.

[6]  Jim Gray,et al.  Benchmark Handbook: For Database and Transaction Processing Systems , 1992 .

[7]  J-C. Laprie,et al.  DEPENDABLE COMPUTING AND FAULT TOLERANCE : CONCEPTS AND TERMINOLOGY , 1995, Twenty-Fifth International Symposium on Fault-Tolerant Computing, 1995, ' Highlights from Twenty-Five Years'..

[8]  Kjell Bratbergsengen,et al.  The Development of the CROSS8 and HC16-186 Parallel (Database) Computers , 1989, IWDM.

[9]  Alfred G. Dale,et al.  A Taxonomy and Performance Model of Data Skew Effects in Parallel Joins , 1991, VLDB.

[10]  Michelle Y. Kim,et al.  Synchronized Disk Interleaving , 1986, IEEE Transactions on Computers.

[11]  Donald E. Knuth,et al.  The art of computer programming: sorting and searching (volume 3) , 1973 .

[12]  Dina Bitton,et al.  Disk Shadowing , 1988, VLDB.

[13]  Edward N. Adams,et al.  Optimizing Preventive Service of Software Products , 1984, IBM J. Res. Dev..

[14]  Arthur Gill Applied algebra for the computer sciences , 1976 .

[15]  Claude Kaiser,et al.  CHORUS Distributed Operating System , 1988, Comput. Syst..

[16]  Randy H. Katz,et al.  A case for redundant arrays of inexpensive disks (RAID) , 1988, SIGMOD '88.

[17]  Randy H. Katz,et al.  Performance consequences of parity placement in disk arrays , 1991, ASPLOS IV.

[18]  Stanley Y. W. Su,et al.  Database computers : principles, architectures, and techniques , 1988 .

[19]  Daniel P. Siewiorek Fault tolerance in commercial computers , 1990, Computer.

[20]  Kishor S. Trivedi,et al.  Performability Analysis: Measures, an Algorithm, and a Case Study , 1988, IEEE Trans. Computers.

[21]  Jim Gray,et al.  A census of Tandem system availability between 1985 and 1990 , 1990 .

[22]  Daniel P. Siewiorek,et al.  High-availability computer systems , 1991, Computer.

[23]  Flaviu Cristian,et al.  A Rigorous Approach to Fault-Tolerant Programming , 1985, IEEE Transactions on Software Engineering.

[24]  Witold Litwin,et al.  LH* - Linear Hashing for Distributed Files , 1993, SIGMOD Conference.

[25]  S.O. Hvasshovd,et al.  Critical issues in the design of a fault-tolerant multiprocessor database server , 1991, [1991] Proceedings Pacific Rim International Symposium on Fault Tolerant Systems.

[26]  田中 英彦,et al.  Database machines and knowledge base machines , 1988 .

[27]  Brian Randell,et al.  System structure for software fault tolerance , 1975, IEEE Transactions on Software Engineering.

[28]  David J. DeWitt,et al.  Database Machines: An Idea Whose Time Passed? A Critique of the Future of Database Machines , 1989, IWDM.

[29]  Richard W. Hamming,et al.  Error detecting and error correcting codes , 1950 .

[30]  Kevin Wilkinson,et al.  KEV - A Kernel for Bubba , 1987, IWDM.

[31]  Michael Stonebraker,et al.  Distributed RAID-a new multiple copy algorithm , 1990, [1990] Proceedings. Sixth International Conference on Data Engineering.

[32]  Anders Rygh Swensen,et al.  The Activity-Dependent Failure Intensity of SPC Systems-Some Empirical Results , 1986, IEEE J. Sel. Areas Commun..

[33]  Wolfgang Graetsch,et al.  Fault tolerance under UNIX , 1989, TOCS.

[34]  Jai Menon,et al.  Comparison of sparing alternatives for disk arrays , 1992, ISCA '92.

[35]  Tom W. Keller,et al.  Data placement in Bubba , 1988, SIGMOD '88.

[36]  Miron Livny,et al.  Multi-disk management algorithms , 1987, SIGMETRICS '87.

[37]  Russell C. Brooks An Approach bo High Availability in High-Transaction-Rate Systems , 1985, IBM Syst. J..

[38]  Tom W. Keller,et al.  A comparison of high-availability media recovery techniques , 1989, SIGMOD '89.

[39]  Michael Stonebraker,et al.  The Case for Shared Nothing , 1985, HPTS.

[40]  Flaviu Cristian,et al.  Understanding fault-tolerant distributed systems , 1991, CACM.

[41]  Donovan A. Schneider,et al.  The Gamma Database Machine Project , 1990, IEEE Trans. Knowl. Data Eng..

[42]  Donald D. Chamberlin,et al.  Dynamic Data Distribution (D3) in a Shared-Nothing Multiprocessor Data Store , 1992, VLDB.

[43]  Walter A. Burkhard,et al.  Disk array storage system reliability , 1993, FTCS-23 The Twenty-Third International Symposium on Fault-Tolerant Computing.

[44]  Craig Partridge,et al.  Gigabit networking , 1993, Addison-Wesley professional computing series.

[45]  R. Freiburghouse Making processing fail-safe , 1982 .

[46]  Stein Gjessing,et al.  Distributed-directory scheme: scalable coherent interface , 1990, Computer.

[47]  Andreas Reuter,et al.  Transaction Processing: Concepts and Techniques , 1992 .

[48]  Algirdas Avizienis,et al.  The N-Version Approach to Fault-Tolerant Software , 1985, IEEE Transactions on Software Engineering.

[49]  George G. Robertson,et al.  Accent: A communication oriented network operating system kernel , 1981, SOSP.

[50]  Anupam Bhide,et al.  An Analysis of Three Transaction Processing Architectures , 1988, VLDB.

[51]  Witold Litwin,et al.  Linear Hashing: A new Algorithm for Files and Tables Addressing , 1980, ICOD.

[52]  Jim Gray,et al.  Parity Striping of Disk Arrays: Low-Cost Reliable Storage with Acceptable Throughput , 1990, VLDB.

[53]  David J. DeWitt,et al.  Hybrid-Range Partitioning Strategy: A New Declustering Strategy for Multiprocessor Database Machines , 1990, VLDB.

[54]  Garth A. Gibson,et al.  RAID: high-performance, reliable secondary storage , 1994, CSUR.

[55]  Jim Gray,et al.  Why Do Computers Stop and What Can Be Done About It? , 1986, Symposium on Reliability in Distributed Software and Database Systems.

[56]  Omri Serlin Fault-Tolerant Systems in Commercial Applications , 1984, Computer.

[57]  Ravishankar K. Iyer,et al.  Faults, symptoms, and software fault tolerance in the Tandem GUARDIAN90 operating system , 1993, FTCS-23 The Twenty-Third International Symposium on Fault-Tolerant Computing.