Software-Based Fault-Tolerant ClockSynchronization for Distributed UNIXEnvironments

| Fault-tolerant clock synchronization is often used in distributed systems with requirements such as close interaction between its components, measurements of elapsed time and ordering of events in the system. Diierent implementation approaches can be used to achieve fault-tolerant clock synchronization, depending on criteria such as performance, cost and availability of hardware and operating system support. This paper describes a low-cost, application level implementation that synchronizes the clocks of UNIX-based computers connected via an Ethernet local area network. The implementation is highly portable and modular, allowing the utilization of diierent fault-tolerant, interactive convergence clock synchronization algorithms. Experimental results obtained with the Fault-Tolerant Midpoint Algorithm (FTMA) 1] are presented and compared against typical results of hardware and operating system-based implementations. We show that a software implementation of the FTMA algorithm is sensitive to environmental aspects such as process priority, CPU and network load, and that it is possible to obtain a more stable algorithm by using a technique referred to as adaptive exponential averaging. Finally, we compare our fault-tolerant implementation with a synchronization protocol used in the Internet, namely the NTP (Network Time Protocol).

[1]  Nancy A. Lynch,et al.  A new fault-tolerant algorithm for clock synchronization , 1984, PODC '84.

[2]  P. M. Melliar-Smith,et al.  Synchronizing clocks in the presence of faults , 1985, JACM.

[3]  Flaviu Cristian A probabilistic approach to distributed clock synchronization , 1989, [1989] Proceedings. The 9th International Conference on Distributed Computing Systems.

[4]  Hermann Kopetz,et al.  Clock Synchronization in Distributed Real-Time Systems , 1987, IEEE Transactions on Computers.

[5]  Fred B. Schneider,et al.  Inexact agreement: accuracy, precision, and graceful degradation , 1985, PODC '85.

[6]  Hermann Kopetz,et al.  Distributed fault-tolerant real-time systems: the Mars approach , 1989, IEEE Micro.

[7]  Bjarne Stroustrup,et al.  C++ Programming Language , 1986, IEEE Softw..

[8]  IEEE Transactions on Computers , Computing in Science & Engineering.

[9]  Nancy A. Lynch,et al.  An Upper and Lower Bound for Clock Synchronization , 1984, Inf. Control..

[10]  Nancy A. Lynch,et al.  Reaching approximate agreement in the presence of faults , 1986, JACM.

[11]  David L. Mills Measured performance of the Network Time Protocol in the Internet system , 1989, RFC.

[12]  Chris J. Walter,et al.  The MAFT Architecture for Distributed Fault Tolerance , 1988, IEEE Trans. Computers.

[13]  Keith Marzullo,et al.  Maintaining the time in a distributed system , 1985, OPSR.

[14]  Manfred Johannes Pfluegl Clock synchronization in fault-tolerant systems , 1992 .

[15]  Douglas M. Blough,et al.  A New and Improved Algorithm for Fault-Tolerant Clock Synchronization , 1995, J. Parallel Distributed Comput..