Anna: A KVS for Any Scale

Modern cloud providers offer dense hardware with multiple cores and large memories, hosted in global platforms. This raises the challenge of implementing high-performance software systems that can effectively scale from a single core to multicore to the globe. Conventional wisdom says that software designed for one scale point needs to be rewritten when scaling up by <inline-formula><tex-math notation="LaTeX">$10-100\times$</tex-math><alternatives><mml:math><mml:mrow><mml:mn>10</mml:mn><mml:mo>-</mml:mo><mml:mn>100</mml:mn><mml:mo>×</mml:mo></mml:mrow></mml:math><inline-graphic xlink:href="wu-ieq1-2898401.gif"/></alternatives></inline-formula> <xref ref-type="bibr" rid="ref1">[1]</xref> . In contrast, we explore how a system can be architected to scale across many orders of magnitude by design. We explore this challenge in the context of a new key-value store system called Anna: a partitioned, multi-mastered system that achieves high performance and elasticity via wait-free execution and coordination-free consistency. Our design rests on a simple architecture of coordination-free actors that perform state update via merge of lattice-based composite data structures. We demonstrate that a wide variety of consistency models can be elegantly implemented in this architecture with unprecedented consistency, smooth fine-grained elasticity, and performance that far exceeds the state of the art.

[1]  H. Garcia-Molina,et al.  Sagas , 1987, SIGMOD Conference.

[2]  David Maier,et al.  Logic and lattices for distributed programming , 2012, SoCC '12.

[3]  Daniel J. Abadi,et al.  Rethinking serializable multiversion concurrency control , 2014, Proc. VLDB Endow..

[4]  Michael Stonebraker,et al.  H-store: a high-performance, distributed main memory transaction processing system , 2008, Proc. VLDB Endow..

[5]  Eddie Kohler,et al.  Cache craftiness for fast multicore key-value storage , 2012, EuroSys '12.

[6]  Beng Chin Ooi,et al.  Towards elastic transactional cloud storage with range query support , 2010, Proc. VLDB Endow..

[7]  Ali Ghodsi,et al.  Coordination Avoidance in Database Systems , 2014, Proc. VLDB Endow..

[8]  Adam Silberstein,et al.  Benchmarking cloud serving systems with YCSB , 2010, SoCC '10.

[9]  Yuen Ren Chao,et al.  Human Behavior and the Principle of Least Effort: An Introduction to Human Ecology , 1950 .

[10]  Alexander J. Smola,et al.  Scaling Distributed Machine Learning with the Parameter Server , 2014, OSDI.

[11]  Ali Ghodsi,et al.  Bolt-on causal consistency , 2013, SIGMOD '13.

[12]  Werner Vogels,et al.  Dynamo: amazon's highly available key-value store , 2007, SOSP.

[13]  Leslie Lamport,et al.  The part-time parliament , 1998, TOCS.

[14]  Michael J. Freedman,et al.  Don't settle for eventual: scalable causal consistency for wide-area storage with COPS , 2011, SOSP.

[15]  Marc Shapiro,et al.  Conflict-Free Replicated Data Types , 2011, SSS.

[16]  Dror G. Feitelson,et al.  Workload Modeling for Performance Evaluation , 2002, Performance.

[17]  Jeffrey Dean,et al.  Challenges in building large-scale information retrieval systems: invited talk , 2009, WSDM '09.

[18]  Reactors: A Case for Predictable, Virtualized OLTP Actor Database Systems , 2017, ArXiv.

[19]  C. J. Clark,et al.  Courtship dives of Anna's hummingbird offer insights into flight performance limits , 2009, Proceedings of the Royal Society B: Biological Sciences.

[20]  Joseph M. Hellerstein,et al.  Consistency Analysis in Bloom: a CALM and Collected Approach , 2011, CIDR.

[21]  Marvin Theimer,et al.  Managing update conflicts in Bayou, a weakly connected replicated storage system , 1995, SOSP.

[22]  Joseph M. Hellerstein,et al.  Eliminating Boundaries in Cloud Storage with Anna , 2018, ArXiv.

[23]  Hyeontaek Lim,et al.  MICA: A Holistic Approach to Fast In-Memory Key-Value Storage , 2014, NSDI.

[24]  Carl Hewitt,et al.  A Universal Modular ACTOR Formalism for Artificial Intelligence , 1973, IJCAI.

[25]  Andrey Kolesnikov,et al.  Load Modeling and Generation for IP-Based Networks: A Unified Approach and Tool Support , 2010, MMB/DFT.

[26]  David E. Culler,et al.  SEDA: an architecture for well-conditioned, scalable internet services , 2001, SOSP.

[27]  Hans-Arno Jacobsen,et al.  PNUTS: Yahoo!'s hosted data serving platform , 2008, Proc. VLDB Endow..

[28]  Robert Morris,et al.  Non-scalable locks are dangerous , 2012 .

[29]  Ali Ghodsi,et al.  Highly Available Transactions: Virtues and Limitations , 2013, Proc. VLDB Endow..

[30]  Fred B. Schneider,et al.  Implementing fault-tolerant services using the state machine approach: a tutorial , 1990, CSUR.

[31]  Daniel J. Abadi,et al.  Design Principles for Scaling Multi-core OLTP Under High Contention , 2015, SIGMOD Conference.

[32]  Tom W. Keller,et al.  Data placement in Bubba , 1988, SIGMOD '88.

[33]  David J. DeWitt,et al.  Chained declustering: a new availability strategy for multiprocessor database machines , 1990, [1990] Proceedings. Sixth International Conference on Data Engineering.

[34]  Daniel J. Abadi,et al.  Latch-free Synchronization in Database Systems: Silver Bullet or Fool's Gold? , 2017, CIDR.

[35]  Sudipta Sengupta,et al.  The Bw-Tree: A B-tree for new hardware platforms , 2013, 2013 IEEE 29th International Conference on Data Engineering (ICDE).

[36]  David Maier,et al.  Indexing in an Actor-Oriented Database , 2017, CIDR.

[37]  Pradeep Dubey,et al.  PALM: Parallel Architecture-Friendly Latch-Free Modifications to B+ Trees on Many-Core Processors , 2011, Proc. VLDB Endow..

[38]  Adrian Schüpbach,et al.  The multikernel: a new OS architecture for scalable multicore systems , 2009, SOSP '09.

[39]  John K. Ousterhout,et al.  In Search of an Understandable Consensus Algorithm , 2014, USENIX ATC.