Spanner: Google's Globally-Distributed Database

Spanner is Google’s scalable, multiversion, globally distributed, and synchronously replicated database. It is the first system to distribute data at global scale and support externally-consistent distributed transactions. This article describes how Spanner is structured, its feature set, the rationale underlying various design decisions, and a novel time API that exposes clock uncertainty. This API and its implementation are critical to supporting external consistency and a variety of powerful features: nonblocking reads in the past, lock-free snapshot transactions, and atomic schema changes, across all of Spanner.

[1]  Daniel J. Rosenkrantz,et al.  System level concurrency control for distributed database systems , 1978, TODS.

[2]  David Kenneth Gifford,et al.  Information storage in a decentralized computer system , 1981 .

[3]  Keith Marzullo,et al.  Maintaining the time in a distributed system , 1985, OPSR.

[4]  Arvola Chan,et al.  Implementing Distributed Read-Only Transactions , 1985, IEEE Transactions on Software Engineering.

[5]  Maurice Herlihy,et al.  Linearizability: a correctness condition for concurrent objects , 1990, TOPL.

[6]  Barbara Liskov,et al.  Practical uses of synchronized clocks in distributed systems , 1991, PODC '91.

[7]  Jim Gray,et al.  A critique of ANSI SQL isolation levels , 1995, SIGMOD '95.

[8]  Robert Gruber,et al.  Efficient optimistic concurrency control using loosely synchronized clocks , 1995, SIGMOD '95.

[9]  Leslie Lamport,et al.  The part-time parliament , 1998, TOCS.

[10]  Jon Howell,et al.  Scalable Byzantine-Fault-Quantifying Clock Synchronization , 2003 .

[11]  GhemawatSanjay,et al.  The Google file system , 2003 .

[12]  Leslie Lamport,et al.  Consensus on transaction commit , 2004, TODS.

[13]  Jon Howell,et al.  Distributed directory service in the Farsite file system , 2006, OSDI '06.

[14]  Jon Howell,et al.  The SMART way to migrate replicated stateful services , 2006, EuroSys.

[15]  Michael Stonebraker,et al.  The End of an Architectural Era (It's Time for a Complete Rewrite) , 2007, VLDB.

[16]  Ozalp Babaoglu,et al.  ACM Transactions on Computer Systems , 2007 .

[17]  Pat Helland,et al.  Life beyond Distributed Transactions: an Apostate's Opinion , 2007, CIDR.

[18]  Hans-Arno Jacobsen,et al.  PNUTS: Yahoo!'s hosted data serving platform , 2008, Proc. VLDB Endow..

[19]  Wilson C. Hsieh,et al.  Bigtable: A Distributed Storage System for Structured Data , 2006, TOCS.

[20]  Tim Kraska,et al.  Building a database on S3 , 2008, SIGMOD Conference.

[21]  Michael Stonebraker,et al.  A comparison of approaches to large-scale data analysis , 2009, SIGMOD Conference.

[22]  Abraham Silberschatz,et al.  HadoopDB: An Architectural Hybrid of MapReduce and DBMS Technologies for Analytical Workloads , 2009, Proc. VLDB Endow..

[23]  Feifei Li,et al.  Improving Transaction-Time DBMS Performance and Functionality , 2009, 2009 IEEE 25th International Conference on Data Engineering.

[24]  Leslie Lamport,et al.  Reconfiguring a state machine , 2010, SIGA.

[25]  Zheng Shao,et al.  Hive - a petabyte scale data warehouse using Hadoop , 2010, 2010 IEEE 26th International Conference on Data Engineering (ICDE 2010).

[26]  Andrey Gubarev,et al.  Dremel : Interactive Analysis of Web-Scale Datasets , 2011 .

[27]  Sanjay Ghemawat,et al.  MapReduce: a flexible data processing tool , 2010, CACM.

[28]  Frank Dabek,et al.  Large-scale Incremental Processing Using Distributed Transactions and Notifications , 2010, OSDI.

[29]  Yawei Li,et al.  Megastore: Providing Scalable, Highly Available Storage for Interactive Services , 2011, CIDR.

[30]  Tim Kraska,et al.  PIQL: Success-Tolerant Query Processing in the Cloud , 2011, Proc. VLDB Endow..

[31]  Ivan Beschastnikh,et al.  Scalable consistency in Scatter , 2011, SOSP.

[32]  Marcos K. Aguilera,et al.  Transactional storage for geo-replicated systems , 2011, SOSP.

[33]  Barbara Liskov,et al.  Granola: Low-Overhead Distributed Transaction Coordination , 2012, USENIX Annual Technical Conference.

[34]  Alexander Shraer,et al.  Dynamic Reconfiguration of Primary/Backup Clusters , 2012, USENIX Annual Technical Conference.

[35]  Xin Chen,et al.  F1: the fault-tolerant distributed RDBMS supporting google's ad business , 2012, SIGMOD Conference.

[36]  Daniel J. Abadi,et al.  Calvin: fast distributed transactions for partitioned database systems , 2012, SIGMOD Conference.

[37]  Brian F. Cooper Spanner: Google's globally-distributed database , 2013, SYSTOR '13.