Fast-fault-tolerant parallel communication and on-line maintenance using information dispersal

Space-efficient Information Dispersal Algorithm (IDA) [ll] is applied to parallel communication in the hypercube. Let N denote the size of the network. Our communication scheme runs in 2.1og N + 1 time’ using constant size buffers. Its probability of successful routing is at least 1 _ N-2.419’logN+1.5 proving Rabin’s conjecture. The same scheme also tolerates O(N) random link failures with high probability. The scheme runs within the said time bound without long delay. On-line and efficient wire testing and replacement on the hypercube can be realized if our fault-tolerant routing scheme is used. Let cr denote the total number of links in the hypercube. It is shown that z a/352 wires can be disabled simultaneously without disrupting the ongoing computation or degrading the routing performance much.

[1]  Bruce M. Maggs,et al.  Universal packet routing algorithms , 1988, [Proceedings 1988] 29th Annual Symposium on Foundations of Computer Science.

[2]  Leslie G. Valiant,et al.  Universal schemes for parallel communication , 1981, STOC '81.

[3]  Abhiram G. Ranade,et al.  How to emulate shared memory , 1991, 28th Annual Symposium on Foundations of Computer Science (sfcs 1987).

[4]  Frank Thomson Leighton,et al.  Fast computation using faulty hypercubes , 1989, STOC '89.

[5]  Debasis Mitra,et al.  Randomized Parallel Communications , 1986, ICPP.

[6]  H. Chernoff A Measure of Asymptotic Efficiency for Tests of a Hypothesis Based on the sum of Observations , 1952 .

[7]  Kai Hwang,et al.  Computer architecture and parallel processing , 1984, McGraw-Hill Series in computer organization and architecture.

[8]  Romas Aleliunas,et al.  Randomized parallel communication (Preliminary Version) , 1982, PODC '82.

[9]  E. T. An Introduction to the Theory of Numbers , 1946, Nature.

[10]  F. Preparata Holographic dispersal and recovery of information , 1989, IEEE Trans. Inf. Theory.

[11]  Michael O. Rabin,et al.  Efficient dispersal of information for security, load balancing, and fault tolerance , 1989, JACM.

[12]  Prabhakar Raghavan,et al.  Probabilistic construction of deterministic algorithms: Approximating packing integer programs , 1986, 27th Annual Symposium on Foundations of Computer Science (sfcs 1986).

[13]  Leslie G. Valiant,et al.  A Scheme for Fast Parallel Communication , 1982, SIAM J. Comput..

[14]  Robert S. Swarz,et al.  The theory and practice of reliable system design , 1982 .