Sundial: Fault-tolerant Clock Synchronization for Datacenters

Clock synchronization is critical for many datacenter applications such as distributed transactional databases, consistent snapshots, and network telemetry. As applications have increasing performance requirements and datacenter networks get into ultra-low latency, we need submicrosecond-level bound on time-uncertainty to reduce transaction delay and enable new network management applications (e.g., measuring one-way delay for congestion control). The state-of-the-art clock synchronization solutions focus on improving clock precision but may incur significant time-uncertainty bound due to the presence of failures. This significantly affects applications because in large-scale datacenters, temperature-related, link, device, and domain failures are common. We present Sundial, a fault-tolerant clock synchronization system for datacenters that achieves ∼100ns time-uncertainty bound under various types of failures. Sundial provides fast failure detection based on frequent synchronization messages in hardware. Sundial enables fast failure recovery using a novel graphbased algorithm to precompute a backup plan that is generic to failures. Through experiments in a >500-machine testbed and large-scale simulations, we show that Sundial can achieve ∼100ns time-uncertainty bound under different types of failures, which is more than two orders of magnitude lower than the state-of-the-art solutions. We also demonstrate the benefit of Sundial on applications such as Spanner and Swift congestion control.

[1]  Ulrich Schmid Synchronized UTC for Distributed Real-Time Systems , 1994 .

[2]  W. J. Klepczynski,et al.  GPS: primary tool for time transfer , 1999, Proc. IEEE.

[3]  Leslie Lamport,et al.  Paxos Made Simple , 2001 .

[4]  Rui Wang,et al.  Minimal Rewiring: Efficient Live Expansion for Clos Data Center Networks , 2019, NSDI.

[5]  Hong Liu,et al.  Jupiter Rising: A Decade of Clos Topologies and Centralized Control in Google's Datacenter Network , 2015, Comput. Commun. Rev..

[6]  Minlan Yu,et al.  LossRadar: Fast Detection of Lost Packets in Data Center Networks , 2016, CoNEXT.

[7]  Leslie Lamport,et al.  Distributed snapshots: determining global states of distributed systems , 1985, TOCS.

[8]  Amin Vahdat,et al.  A scalable, commodity data center network architecture , 2008, SIGCOMM '08.

[9]  Hakim Weatherspoon,et al.  Globally Synchronized Time via Datacenter Networks , 2016, SIGCOMM.

[10]  Radhika Nagpal,et al.  Firefly-inspired sensor network synchronicity with realistic radio effects , 2005, SenSys '05.

[11]  Minlan Yu,et al.  FlowRadar: A Better NetFlow for Data Centers , 2016, NSDI.

[12]  Christopher Frost,et al.  Spanner: Google's Globally-Distributed Database , 2012, OSDI.

[13]  Keith Marzullo,et al.  Maintaining the time in a distributed system , 1985, OPSR.

[14]  Albert G. Greenberg,et al.  VL2: a scalable and flexible data center network , 2009, SIGCOMM '09.

[15]  Taesoon Park,et al.  Checkpointing and rollback-recovery in distributed systems , 1989 .

[16]  Gautam Kumar,et al.  Swift: Delay is Simple and Effective for Congestion Control in the Datacenter , 2020, SIGCOMM.

[17]  David L. Mills,et al.  Internet time synchronization: the network time protocol , 1991, IEEE Trans. Commun..

[18]  Ben Y. Zhao,et al.  Packet-Level Telemetry in Large Datacenter Networks , 2015, SIGCOMM.

[19]  Amin Vahdat,et al.  Exploiting a Natural Network Effect for Scalable, Fine-grained Clock Synchronization , 2018, NSDI.

[20]  Vivek S. Borkar,et al.  A New Distributed Time Synchronization Protocol for Multihop Wireless Networks , 2006, Proceedings of the 45th IEEE Conference on Decision and Control.

[21]  Leslie Lamport,et al.  Synchronizing Time Servers , 1987 .

[22]  Minlan Yu,et al.  HPCC: high precision congestion control , 2019, SIGCOMM.

[23]  Luca Schenato,et al.  Average TimeSynch: A consensus-based protocol for clock synchronization in wireless sensor networks , 2011, Autom..

[24]  Dushyanth Narayanan,et al.  Fast General Distributed Transactions with Opacity , 2019, SIGMOD Conference.

[25]  Amin Vahdat,et al.  Snap: a microkernel approach to host networking , 2019, SOSP.

[26]  Amin Vahdat,et al.  TIMELY: RTT-based Congestion Control for the Datacenter , 2015, Comput. Commun. Rev..

[27]  Vincent Liu,et al.  Synchronized network snapshots , 2018, SIGCOMM.

[28]  Ramesh Govindan,et al.  Understanding Lifecycle Management Complexity of Datacenter Topologies , 2019, NSDI.

[29]  Robert E. Tarjan,et al.  Edge-disjoint spanning trees and depth-first search , 1976, Acta Informatica.

[30]  Understanding Rapid Spanning Tree Protocol ( 802 . 1 w ) , 2006 .