Supporting Fault-Tolerant Parallel Programming in Linda

Linda is a language for programming parallel applications whose most notable feature is a distributed shared memory called tuple space. While suitable for a wide variety of programs, one shortcoming of the language as commonly defined and implemented is a lack of support for writing programs that can tolerate failures in the underlying computing platform. This paper describes FT-Linda, a version of Linda that addresses this problem by providing two major enhancements that facilitate the writing of fault-tolerant applications: stable tuple spaces and atomic execution of tuple space operations. The former is a type of stable storage in which tuple values are guaranteed to persist across failures, while the latter allows collections of tuple operations to be executed in an all-or-nothing fashion despite failures and concurrency. The design of these enhancements is presented in detail and illustrated by examples drawn from both the Linda and fault-tolerance domains. An implementation of FT-Linda for a network of workstations is also described. The design is based on replicating the contents of stable tuple spaces to provide failure resilience and then updating the copies using atomic multicast. This strategy allows an efficient implementation in which only a single multicast message is needed for each atomic collection of tuple space operations. >

[1]  Robert Jellinghaus,et al.  Eiffel Linda: an object-oriented Linda dialect , 1990, SIGP.

[2]  Patricia Florissi,et al.  On remote procedure call , 1992, CASCON.

[3]  Wilhelm Hasselbring A Formal Z Specification of ProSet-Linda , 1992 .

[4]  Paolo Ciancarini,et al.  Linda meets Minix , 1993, OPSR.

[5]  Jim Gray,et al.  Notes on Data Base Operating Systems , 1978, Advanced Course: Operating Systems.

[6]  Jim Gray,et al.  An approach to decentralized computer systems , 1986, IEEE Transactions on Software Engineering.

[7]  Maurice Herlihy,et al.  Avalon : language support for reliable distributed systems , 1986 .

[8]  Richard D. Schlichting,et al.  Fail-stop processors: an approach to designing fault-tolerant computing systems , 1983, TOCS.

[9]  Fred B. Schneider,et al.  Implementing fault-tolerant services using the state machine approach: a tutorial , 1990, CSUR.

[10]  Leslie Lamport,et al.  The Byzantine Generals Problem , 1982, TOPL.

[11]  Henri E. Bal Programming distributed systems , 1990 .

[12]  Jonathan Jacky Inside RISKS: risks in medical electronics , 1990, CACM.

[13]  David Kaminsky Adaptive parallelism with Piranha , 1995 .

[14]  Pamela Zave,et al.  Feature interactions and formal specifications in telecommunications , 1993, Computer.

[15]  Andrew S. Tanenbaum,et al.  An overview of the Amoeba distributed operating system , 1981, OPSR.

[16]  A. S. Xu,et al.  A FAULT-TOLERANT NETWORK KERNEL FOR LINDA , 1988 .

[17]  D SchlichtingRichard,et al.  Supporting Fault-Tolerant Parallel Programming in Linda , 1995 .

[18]  Paulo Veríssimo,et al.  The Delta-4 approach to dependability in open distributed computing systems , 1988, [1988] The Eighteenth International Symposium on Fault-Tolerant Computing. Digest of Papers.

[19]  Jerrold Sol Leichter Shared tuple memories, shared memories, buses and lan's--linda implementations across the spectrum of connectivity , 1989 .

[20]  Hector Garcia-Molina,et al.  Ordered and reliable multicast communication , 1991, TOCS.

[21]  Ellen H. Siegel,et al.  Implementing distributed Linda in Standard ML , 1991 .

[22]  Nicholas Carriero,et al.  Linda and Friends , 1986, Computer.

[23]  Scott R Cannon,et al.  Adding fault‐tolerant transaction processing to LINDA , 1994, Softw. Pract. Exp..

[24]  Keith Marzullo,et al.  High Availability in a Real-Time System , 1993, ACM SIGOPS Oper. Syst. Rev..

[25]  Keith Marzullo,et al.  High availability in a real-time system , 1992, EW 5.

[26]  Robert M. Hyatt,et al.  Construction of a fault-tolerant distributed tuple-space , 1993, SAC '93.

[27]  Pankaj Jalote,et al.  Fault tolerance in distributed systems , 1994 .

[28]  C. A. R. Hoare,et al.  Communicating sequential processes , 1978, CACM.

[29]  Flaviu Cristian,et al.  An efficient, fault-tolerant protocol for replicated data management , 1985, Fault-Tolerant Distributed Computing.

[30]  Nicholas Carriero,et al.  Linda in context , 1989, CACM.

[31]  Bill Nitzberg,et al.  Distributed shared memory: a survey of issues and algorithms , 1991, Computer.

[32]  Larry L. Peterson,et al.  The x-Kernel: An Architecture for Implementing Network Protocols , 1991, IEEE Trans. Software Eng..

[33]  Nicholas Carriero,et al.  The S/Net's Linda kernel , 1986, TOCS.

[34]  Henry Ledgard,et al.  Reference Manual for the ADA® Programming Language , 1983, Springer New York.

[35]  Flaviu Cristian,et al.  An efficient, fault-tolerant protocol for replicated data management , 1985, PODS '85.

[36]  Henri E. Bal,et al.  Transparent fault-tolerance in parallel Orca programs , 1992 .

[37]  Nicholas Carriero,et al.  How to write parallel programs - a first course , 1990 .

[38]  Andrew Birrell,et al.  Implementing remote procedure calls , 1984, TOCS.

[39]  Ewing L. Lusk,et al.  p4-Linda: a portable implementation of Linda , 1993, [1993] Proceedings The 2nd International Symposium on High Performance Distributed Computing.

[40]  Parameswaran Ramanathan,et al.  Fault-tolerant clock synchronization in distributed systems , 1990, Computer.

[41]  Fred B. Schneider,et al.  Primary-Backup Protocols: Lower Bounds and Optimal Implementations , 1992 .

[42]  Barbara Liskov,et al.  A design for a fault-tolerant, distributed implementation of Linda , 1989, [1989] The Nineteenth International Symposium on Fault-Tolerant Computing. Digest of Papers.

[43]  Robert D. Bjornson Linda on distributed memory multiprocessors , 1993 .

[44]  B SchneiderFred Implementing fault-tolerant services using the state machine approach: a tutorial , 1990 .

[45]  Taesoon Park,et al.  Checkpointing and rollback-recovery in distributed systems , 1989 .

[46]  Butler W. Lampson,et al.  Distributed Systems - Architecture and Implementation, An Advanced Course , 1981, Advanced Course: Distributed Systems.

[47]  Vaidy S. Sunderam,et al.  PVM: A Framework for Parallel Distributed Computing , 1990, Concurr. Pract. Exp..

[48]  Richard D. Schlichting,et al.  Preserving and using context information in interprocess communication , 1989, TOCS.

[49]  Andrew P. Black,et al.  Object structure in the Emerald system , 1986, OOPSLA 1986.

[50]  Douglas Comer,et al.  Internetworking with TCP/IP , 1988 .

[51]  Barbara Liskov,et al.  Guardians and Actions: Linguistic Support for Robust, Distributed Programs , 1983, TOPL.

[52]  David A. Patterson,et al.  Computer Architecture: A Quantitative Approach , 1969 .

[53]  L.D. Cagan,et al.  Software/parallel processing-Linda on the networks , 1993, IEEE Spectrum.

[54]  Robert A. Whiteside,et al.  Using Linda for supercomputing on a local area network , 1988, Proceedings. SUPERCOMPUTING '88.

[55]  Shivakant Mishra,et al.  Experience with modularity in consul , 1993, Softw. Pract. Exp..

[56]  Richard D. Schlichting,et al.  Tolerating failures in the bag-of-tasks programming paradigm , 1991, [1991] Digest of Papers. Fault-Tolerant Computing: The Twenty-First International Symposium.

[57]  Selim G. Akl,et al.  Design and analysis of parallel algorithms , 1985 .

[58]  André Schiper,et al.  Lightweight causal and atomic group multicast , 1991, TOCS.

[59]  Gregory R. Andrews,et al.  The SR programming language: concurrency in practice , 1993 .

[60]  Henri E. Bal,et al.  Programming languages for distributed computing systems , 1989, CSUR.

[61]  Henri E. Bal,et al.  Parallel programming using shared objects and broadcasting , 1992, Computer.

[62]  I. Bey,et al.  Delta-4: A Generic Architecture for Dependable Distributed Computing , 1991, Research Reports ESPRIT.

[63]  Nicholas Carriero,et al.  Coordination languages and their significance , 1992, CACM.

[64]  Henri E. Bal,et al.  Orca: A Language For Parallel Programming of Distributed Systems , 1992, IEEE Trans. Software Eng..

[65]  Paulo Veríssimo,et al.  Reliable broadcast for fault-tolerance on local computer networks , 1990, Proceedings Ninth Symposium on Reliable Distributed Systems.

[66]  Shigeru Chiba,et al.  Exploiting a weak consistency to implement distributed tuple space , 1992, [1992] Proceedings of the 12th International Conference on Distributed Computing Systems.

[67]  J. D. Day,et al.  A principle for resilient sharing of distributed resources , 1976, ICSE '76.

[68]  Jonathan Walpole,et al.  Recovery with limited replay: fault-tolerant processes in Linda , 1990, Proceedings of the Second IEEE Symposium on Parallel and Distributed Processing 1990.

[69]  Willy Zwaenepoel,et al.  Implementation and performance of Munin , 1991, SOSP '91.

[70]  Gregory R. Andrews,et al.  Concurrent programming - principles and practice , 1991 .

[71]  Flaviu Cristian,et al.  Understanding fault-tolerant distributed systems , 1991, CACM.

[72]  Shivakant Mishra,et al.  Consul: a communication substrate for fault-tolerant distributed programs , 1993, Distributed Syst. Eng..

[73]  Dennis Shasha,et al.  PLinda 2.0: a transactional/checkpointing approach to fault tolerant Linda , 1994, Proceedings of IEEE 13th Symposium on Reliable Distributed Systems.

[74]  David Gelernter,et al.  Generative communication in Linda , 1985, TOPL.

[75]  Richard D. Schlichting,et al.  Language Support for Fault-Tolerant Parallel and Distributed Programming , 1994 .

[76]  Kenneth P. Birman,et al.  Reliable communication in the presence of failures , 1987, TOCS.

[77]  Dennis Shasha,et al.  The many faces of consensus in distributed systems , 1992, Computer.

[78]  Leslie Lamport,et al.  Time, clocks, and the ordering of events in a distributed system , 1978, CACM.

[79]  Edward Joseph Segall Tuple space operations: multiple-key search, on-line matching and wait-free synchronization , 1993 .

[80]  Shivakant Mishra,et al.  Abstractions for Constructing Dependable Distributed Systems , 1992 .

[81]  David Gelernter,et al.  Supercomputing out of recycled garbage: preliminary experience with Piranha , 1992, ICS '92.

[82]  Nicholas Carriero,et al.  Applications experience with Linda , 1988, PPoPP 1988.

[83]  Dennis Shasha,et al.  Persistant Linda: Linda + Transactions + Query Processing , 1991, Research Directions in High-Level Parallel Programming Languages.

[84]  Edsger W. Dijkstra,et al.  Guarded commands, nondeterminacy and formal derivation of programs , 1975, Commun. ACM.