Critical issues in the design of a fault-tolerant multiprocessor database server

HypRa is a database server designed to meet extreme requirements from e.g. telecom applications. Fault-tolerance has been designed into HypRa by: (1) taking it into account at all hard-ware and software levels, (2) using a shared-nothing, coarse grained hardware platform with homogeneous nodes, basic fault detection, and a multiway interconnection network with dynamic rerouting, (3) doing fault masking and repair in software to achieve flexibility and dynamic reconfigurability, (4) using data fragmentation, fragment replication and replica allocation, and dynamic reconfiguration to automatically reproduce the data availability fault-tolerance level.<<ETX>>

[1]  Bruce G. Lindsay,et al.  Efficient commit protocols for the tree of processes model of distributed transactions , 1985, OPSR.

[2]  S. F. Nugent,et al.  The iPSC/2 direct-connect communications technology , 1988, C3P.

[3]  M. Pniel,et al.  Mark IIIfp hypercube concurrent processor architecture , 1988, C3P.

[4]  Randy H. Katz,et al.  A case for redundant arrays of inexpensive disks (RAID) , 1988, SIGMOD '88.

[5]  Flaviu Cristian,et al.  An efficient, fault-tolerant protocol for replicated data management , 1985, PODS '85.

[6]  R. Acevedo,et al.  Research report , 1967, Revista odontologica de Puerto Rico.

[7]  BratbergsengenKjell,et al.  A neighbor connected processor network for performing relational algebra operations , 1980 .

[8]  Donovan A. Schneider,et al.  The Gamma Database Machine Project , 1990, IEEE Trans. Knowl. Data Eng..

[9]  William J. Dally,et al.  Deadlock-Free Message Routing in Multiprocessor Interconnection Networks , 1987, IEEE Transactions on Computers.

[10]  Jim Gray,et al.  Why Do Computers Stop and What Can Be Done About It? , 1986, Symposium on Reliability in Distributed Software and Database Systems.

[11]  David J. DeWitt,et al.  Multiprocessor Hash-Based Join Algorithms , 1985, VLDB.

[12]  Kjell Bratbergsengen,et al.  Hashing Methods and Relational Algebra Operations , 1984, VLDB.

[13]  Kjell Bratbergsengen Algebra Operations on a Parallel Computer - Performance Evaluation , 1987, IWDM.

[14]  Flaviu Cristian,et al.  Understanding fault-tolerant distributed systems , 1991, CACM.

[15]  Jim Gray,et al.  The Transaction Concept: Virtues and Limitations (Invited Paper) , 1981, VLDB.

[16]  Michael Stonebraker,et al.  The Case for Shared Nothing , 1985, HPTS.

[17]  Bjørn Arild W. Baugstø,et al.  Parallel Sorting Methods for Large Data Volumes on a Hypercube Database Computer , 1989, IWDM.

[18]  O. Torbjornsen Shortest Path Routing in a Failsoft Hypercube Database Machine , 1990, Proceedings of the Fifth Distributed Memory Computing Conference, 1990..

[19]  Hamid Pirahesh,et al.  ARIES: a transaction recovery method supporting fine-granularity locking and partial rollbacks using write-ahead logging , 1998 .

[20]  Jim Gray,et al.  Notes on Data Base Operating Systems , 1978, Advanced Course: Operating Systems.

[21]  Umeshwar Dayal,et al.  Organizing long-running activities with triggers and transactions , 1990, SIGMOD '90.

[22]  O. Torbjornsen,et al.  Communication on HC16 - A Study of Methods and Performance in a Hypercubic Network Based on Dual Port RAM , 1990, Proceedings of the Fifth Distributed Memory Computing Conference, 1990..

[23]  이헌,et al.  [기술동향 소개]Fault Tolerant Computing System , 1985 .

[24]  Michael Stonebraker,et al.  Concurrency Control and Consistency of Multiple Copies of Data in Distributed Ingres , 1979, IEEE Transactions on Software Engineering.

[25]  Hamid Pirahesh,et al.  Parallelism in relational data base systems: architectural issues and design approaches , 1990, DPDS '90.

[26]  Flaviu Cristian,et al.  An efficient, fault-tolerant protocol for replicated data management , 1985, Fault-Tolerant Distributed Computing.

[27]  Kjell Bratbergsengen,et al.  A Neighbor Connected Processor Network for Performing Relational Algebra Operations , 1980, Computer Architecture for Non-Numeric Processing.

[28]  Stefano Ceri,et al.  Distributed Databases: Principles and Systems , 1984 .

[29]  D. Woolley The White Paper. , 1972, British medical journal.

[30]  C. Mohan,et al.  ARIES/NT: A Recovery Method Based on Write-Ahead Logging for Nested Transactions , 1989, VLDB.

[31]  Kjell Bratbergsengen Relational Algebra Operations , 1990, PRISMA Workshop.

[32]  Prithviraj Banerjee,et al.  Distributed algorithms for shortest-path, deadlock-free routing and broadcasting in arbitrarily faulty hypercubes , 1990, [1990] Digest of Papers. Fault-Tolerant Computing: 20th International Symposium.