论文信息 - Runtime system level fault tolerance for a distributed functional language

Runtime system level fault tolerance for a distributed functional language

Distributed Fault Tolerance entails detecting errors, confining the damage caused, recovery from the errors, and providing continued service on a network of co-operating machines. Functional languages potentially offer benefits for distributed fault tolerance: many computations are pure, and hence have no side-effects to be reversed during error recovery. Moreover functional languages have a high-level runtime system (RTS) where computations and data are readily manipulated. We propose a new RTS level of fault tolerance for distributed functional languages, and outline a design for its implementation for the GdH language. Glasgow distributed Haskell is a small extension to the Haskell language and the fault tolerance design utilises existing distributed graph reduction mechanisms. The design distinguishes between pure and impure computations; impure or side effecting computations must be recovered using conventional exceptionbased techniques, but the RTS attempts implicit backward recovery of pure computations.

[1] Simon L. Peyton Jones,et al. Asynchronous exceptions in Haskell , 2001, PLDI '01.

[2] Murray Cole,et al. Algorithmic Skeletons: Structured Management of Parallel Computation , 1989 .

[3] Simon L. Peyton Jones,et al. A semantics for imprecise exceptions , 1999, PLDI '99.

[4] Peter Van Roy,et al. An overview of the design of Distributed Oz , 1997, PASCO '97.

[5] Ali Mili,et al. Introduction to Program Fault Tolerance , 1990 .

[6] Hermann Kopetz,et al. Fault tolerance, principles and practice , 1990 .

[7] A. M. Turing,et al. Checking a large routine , 1989 .

[8] Dimiter R. Avresky,et al. Dependable Network Computing , 1999 .

[9] Brian Randell,et al. Facing up to Faults , 2000 .

[10] Simon L. Peyton Jones,et al. GUM: a portable parallel implementation of Haskell , 1996, PLDI '96.

[11] Benjamin C. Pierce,et al. Pict: a programming language based on the Pi-Calculus , 2000, Proof, Language, and Interaction.

[12] Suresh Jagannathan,et al. Higher-order distributed objects , 1995, TOPL.

[13] Claes Wikstrom,et al. Distributed programming in Erlang , 1994 .

[14] David Turner,et al. Ensuring Termination in ESFP , 2000, J. Univers. Comput. Sci..