Unsafety vectors: a new fault-tolerant routing for the binary n-cube

This paper presents a new fault-tolerant routing algorithm for the binary n-cube which overcomes the limitations of the recently-proposed safety vectors algorithm (IEEE Trans. Parallel Distribut. Syst. 9 (4) (1998) 321). The algorithm is based on the concept of "unsafety vectors". Each node A starts by computing a first level unsafety set, S1A, composed of the set of unreachable neighbours. It then performs (m - 1) exchanges with its neighbours to determine the k-level unsafety set, SkA, for all 1 ≤ k ≤ m, where m is an adjustable parameter between 1 and n. SkA represents the set of all nodes at Hamming distance k from node A which are faulty or unreachable from A due to faulty nodes (or links). Equipped with these unsafety sets, each node calculates unsafety vectors, which are then used to achieve an efficient fault-tolerant routing in the binary n-cube. The kth element of the unsafety vector of node A represents a measure of the routing unsafety at distance k from A. We present an analytical study proving some properties of the proposed algorithm. We also conduct a comparative analysis through extensive simulation experiments that reveal the superiority of the proposed algorithm over the safety vectors algorithm (IEEE Trans. Parallel Distribut. Syst. 9 (4) (1998) 321) in terms of different performance measures, e.g. routing distances and percentage of reachability.

[1]  Ming-Syan Chen,et al.  Adaptive Fault-Tolerant Routing in Hypercube Multicomputers , 1990, IEEE Trans. Computers.

[2]  Quentin F. Stout,et al.  Hypercube message routing in the presence of faults , 1988, C3P.

[3]  Youran Lan,et al.  An Adaptive Fault-Tolerant Routing Algorithm for Hypercube Multicomputers , 1995, IEEE Trans. Parallel Distributed Syst..

[4]  Lionel M. Ni,et al.  A survey of wormhole routing techniques in direct networks , 1993, Computer.

[5]  Justin R. Rattner Concurrent processing: a new direction in scientific computing in afips conference proceedings , 1985 .

[6]  Lars Lundberg,et al.  Performance Optimization Using Extended Critical Path Analysis in Multithreaded Programs on Multiprocessors , 2001, J. Parallel Distributed Comput..

[7]  John P. Hayes,et al.  A Fault-Tolerant Communication Scheme for Hypercube Computers , 1992, IEEE Trans. Computers.

[8]  Jang-Ping Sheu,et al.  A Multicast Algorithm for Hypercube Multiprocessors , 1994, Parallel Algorithms Appl..

[9]  Theodore R. Bashkow,et al.  A large scale, homogeneous, fully distributed parallel machine, I , 1977, ISCA '77.

[10]  Jie Wu,et al.  Optimal fault-tolerant routing in hypercubes using extended safety vectors , 2000, Proceedings Seventh International Conference on Parallel and Distributed Systems (Cat. No.PR00568).

[11]  Jie Wu Reliable Unicasting in Faulty Hypercubes Using Safety Levels , 1997, IEEE Trans. Computers.

[12]  Youran Lan A Fault-Tolerant Routing Algorithm in Hypercubes , 1994, ICPP.

[13]  Sudhakar Yalamanchili,et al.  Adaptive routing protocols for hypercube interconnection networks , 1993, Computer.

[14]  Charles L. Seitz,et al.  The cosmic cube , 1985, CACM.

[15]  Ge-Ming Chiu,et al.  A Fault-Tolerant Routing Strategy in Hypercube Multicomputers , 1996, IEEE Trans. Computers.

[16]  Jie Wu,et al.  Broadcasting in faulty hypercubes , 1993, Microprocess. Microprogramming.

[17]  Yousef Saad,et al.  Data Communication in Hypercubes , 1989, J. Parallel Distributed Comput..

[18]  Cauligi S. Raghavendra,et al.  Free dimensions-an effective approach to achieving fault tolerance in hypercube , 1992, [1992] Digest of Papers. FTCS-22: The Twenty-Second International Symposium on Fault-Tolerant Computing.

[19]  Ge-Ming Chiu,et al.  Use of Routing Capability for Fault-Tolerant Routing in Hypercube Multicomputers , 1997, IEEE Trans. Computers.

[20]  D J Evans,et al.  Parallel processing , 1986 .

[21]  Ge-Ming Chiu,et al.  Fault-tolerant routing strategy using routing capability in hypercube multicomputers , 1996, Proceedings of 1996 International Conference on Parallel and Distributed Systems.

[22]  Jie Wu,et al.  Adaptive Fault-Tolerant Routing in Cube-Based Multicomputers Using Safety Vectors , 1998, IEEE Trans. Parallel Distributed Syst..

[23]  Cauligi S. Raghavendra,et al.  Achieving Fault Tolerance in Hypercubes , 1995 .

[24]  Ming-Syan Chen,et al.  Depth-First Search Approach for Fault-Tolerant Routing in Hypercube , 1990, IEEE Trans. Parallel Distributed Syst..

[25]  Ranga Vemuri,et al.  An integrated multicomponent synthesis environment for MCMs , 1993, Computer.

[26]  Karen A. Loveland,et al.  LARGE SCALE , 1991 .