Generating a fault-tolerant global clock using high-speed control signals for the MetaNet architecture

Describes a new technique, based on exchanging control signals between neighboring nodes, for constructing a stable and fault-tolerant global clock in a distributed system with an arbitrary topology. It is shown that it is possible to construct a global clock reference with a time step that is much smaller than the propagation delay over the network's links. The synchronization algorithm ensures that the global clock "tick" has a stable periodicity, and therefore, it is possible to tolerate failures of links and clocks that operate faster and/or slower than nominally specified, as well as hard failures. The approach taken is to generate a global clock from the ensemble of the local transmission clocks and not to directly synchronize these high-speed clocks. The steady-state algorithm, which generates the global clock, is executed in hardware by the network interface of each node. At the network interface, it is possible to measure accurately the propagation delay between neighboring nodes with a small error or uncertainty and thereby to achieve global synchronization that is proportional to these error measurements. It is shown that the local clock drift (or rate uncertainty) has only a secondary effect on the maximum global clock rate. The synchronization algorithm can tolerate any physical failure. It will continue to operate correctly on any connected segment of the network, i.e., it can tolerate any number of link and node failures, as long as the network remains connected. >

[1]  Moti Yung,et al.  Routing and Flow Control on the MetaNet: An Overview , 1994, Comput. Networks ISDN Syst..

[2]  Flaviu Cristian A probabilistic approach to distributed clock synchronization , 1989, [1989] Proceedings. The 9th International Conference on Distributed Computing Systems.

[3]  Yoram Ofek,et al.  A Local Fairness Algorithm for Gigabit LAN's/MAN's with Spatial Reuse , 1993, IEEE J. Sel. Areas Commun..

[4]  Keith Marzullo,et al.  Maintaining the time in a distributed system , 1983, PODC '83.

[5]  Tzu-I Jonathan Fan Fault tolerant clocking system , 1978 .

[6]  Hermann Kopetz,et al.  Clock Synchronization in Distributed Real-Time Systems , 1987, IEEE Transactions on Computers.

[7]  Reuven Bar-Yehuda,et al.  Fault Tolerant Distributed Majority Commitment , 1988, J. Algorithms.

[8]  Yoram Ofek,et al.  MetaRing-a full-duplex ring with fairness and spatial reuse , 1993, IEEE Trans. Commun..

[9]  Yoram Ofek,et al.  Distributed Global Event Synchronization in a Fiber Optic Hypergraph Network , 1987, ICDCS.

[10]  Leslie Lamport,et al.  Time, clocks, and the ordering of events in a distributed system , 1978, CACM.

[11]  Moti Yung,et al.  Principle for high speed network control: congestion-and deadlock-freeness, self-routing, and a single buffer per link , 1990, PODC '90.

[12]  Moti Yung,et al.  Lossless asynchronous broadcast-with-feedback on the MetaNet architecture , 1991, IEEE INFCOM '91. The conference on Computer Communications. Tenth Annual Joint Comference of the IEEE Computer and Communications Societies Proceedings.

[13]  Parameswaran Ramanathan,et al.  Hardware-Assisted Software Clock Synchronization for Homogeneous Distributed Systems , 1990, IEEE Trans. Computers.

[14]  Danny Dolev,et al.  On the Possibility and Impossibility of Achieving Clock Synchronization , 1986, J. Comput. Syst. Sci..

[15]  Joep L. W. Kessels Two Designs of a Fault-Tolerant Clocking System , 1984, IEEE Transactions on Computers.

[16]  Nancy A. Lynch,et al.  A New Fault-Tolerance Algorithm for Clock Synchronization , 1988, Inf. Comput..