Logical Physical Clocks and Consistent Snapshots in Globally Distributed Databases

There is a gap between the theory and practice of distributed systems in terms of the use of time. The theory of distributed systems shunned the notion of time, and introduced “causality tracking” as a clean abstraction to reason about concurrency. The practical systems employed physical time (NTP) information but in a best effort manner due to the difficulty of achieving tight clock synchronization. In an effort to bridge this gap and reconcile the theory and practice of distributed systems on the topic of time, we propose a hybrid logical clock, HLC, that combines the best of logical clocks and physical clocks. HLC captures the causality relationship like logical clocks, and enables easy identification of consistent snapshots in distributed systems. Dually, HLC can be used in lieu of physical/NTP clocks since it maintains its logical clock to be always close to the NTP clock. Moreover HLC fits in to 64 bits NTP timestamp format, and is masking tolerant to NTP kinks and uncertainties. We show that HLC has many benefits for wait-free transaction ordering and performing snapshot reads in multiversion globally distributed databases.

[1]  Edsger W. Dijkstra,et al.  Self-stabilizing systems in spite of distributed control , 1974, CACM.

[2]  Sandeep S. Kulkarni,et al.  Stabilizing causal deterministic merge , 2005, J. High Speed Networks.

[3]  Donald Beaver,et al.  Dapper, a Large-Scale Distributed Systems Tracing Infrastructure , 2010 .

[4]  Prashant Malik,et al.  Cassandra: structured storage system on a P2P network , 2009, PODC '09.

[5]  Friedemann Mattern,et al.  Virtual Time and Global States of Distributed Systems , 2002 .

[6]  Nancy A. Lynch,et al.  Gradient clock synchronization , 2004, PODC '04.

[7]  Werner Vogels,et al.  Building reliable distributed systems at a worldwide scale demands trade-offs between consistency and availability. , 2022 .

[8]  Emin Gün Sirer,et al.  Kronos: the design and implementation of an event ordering service , 2014, EuroSys '14.

[9]  David L. Mills,et al.  A brief history of NTP time: memoirs of an Internet timekeeper , 2003, CCRV.

[10]  Marcos K. Aguilera,et al.  Transactional storage for geo-replicated systems , 2011, SOSP.

[11]  Ramakrishna Kotla,et al.  Zyzzyva , 2007, SOSP.

[12]  Gyula Simon,et al.  The flooding time synchronization protocol , 2004, SenSys '04.

[13]  Marcos K. Aguilera,et al.  Transaction chains: achieving serializability with low latency in geo-distributed storage systems , 2013, SOSP.

[14]  Lorenzo Alvisi,et al.  Scalable Causal Message Logging for Wide-Area Environments , 2001, Euro-Par.

[15]  Michael J. Freedman,et al.  Don't settle for eventual: scalable causal consistency for wide-area storage with COPS , 2011, SOSP.

[16]  Christopher Frost,et al.  Spanner: Google's Globally-Distributed Database , 2012, OSDI.

[17]  Cheng Li,et al.  Making geo-replicated systems fast as possible, consistent when necessary , 2012, OSDI 2012.

[18]  Colin J. Fidge,et al.  Timestamps in Message-Passing Systems That Preserve the Partial Ordering , 1988 .

[19]  David Mazières,et al.  Replication, history, and grafting in the Ori file system , 2013, SOSP.

[20]  LamportLeslie Time, clocks, and the ordering of events in a distributed system , 1978 .

[21]  Ethan Katz-Bassett,et al.  SPANStore: cost-effective geo-replicated storage spanning multiple cloud services , 2013, SOSP.