Anna: A KVS for Any Scale

Modern cloud providers offer dense hardware with multiple cores and large memories, hosted in global platforms. This raises the challenge of implementing high-performance software systems that can effectively scale from a single core to multicore to the globe. Conventional wisdom says that software designed for one scale point needs to be rewritten when scaling up by 10-100X. In contrast, we explore how a system can be architected to scale across many orders of magnitude by design. We explore this challenge in the context of a new key-value store system called Anna: a partitioned, multi-mastered system that achieves high performance and elasticity via wait-free execution and coordination-free consistency. Our design rests on a simple architecture of coordination-free actors that perform state update via merge of lattice-based composite data structures. We demonstrate that a wide variety of consistency models can be elegantly implemented in this architecture with unprecedented consistency, smooth fine-grained elasticity, and performance that far exceeds the state of the art.

[1]  Yuen Ren Chao,et al.  Human Behavior and the Principle of Least Effort: An Introduction to Human Ecology , 1950 .

[2]  Carl Hewitt,et al.  A Universal Modular ACTOR Formalism for Artificial Intelligence , 1973, IJCAI.

[3]  Tom W. Keller,et al.  Data placement in Bubba , 1988, SIGMOD '88.

[4]  David J. DeWitt,et al.  Chained declustering: a new availability strategy for multiprocessor database machines , 1990, [1990] Proceedings. Sixth International Conference on Data Engineering.

[5]  Fred B. Schneider,et al.  Implementing fault-tolerant services using the state machine approach: a tutorial , 1990, CSUR.

[6]  Marvin Theimer,et al.  Managing update conflicts in Bayou, a weakly connected replicated storage system , 1995, SOSP.

[7]  Leslie Lamport,et al.  The part-time parliament , 1998, TOCS.

[8]  David E. Culler,et al.  SEDA: An Architecture for Scalable, Well-Conditioned Internet Services , 2001 .

[9]  Dror G. Feitelson,et al.  Workload Modeling for Performance Evaluation , 2002, Performance.

[10]  Werner Vogels,et al.  Dynamo: amazon's highly available key-value store , 2007, SOSP.

[11]  Michael Stonebraker,et al.  H-store: a high-performance, distributed main memory transaction processing system , 2008, Proc. VLDB Endow..

[12]  Hans-Arno Jacobsen,et al.  PNUTS: Yahoo!'s hosted data serving platform , 2008, Proc. VLDB Endow..

[13]  C. J. Clark,et al.  Courtship dives of Anna's hummingbird offer insights into flight performance limits , 2009, Proceedings of the Royal Society B: Biological Sciences.

[14]  Adrian Schüpbach,et al.  The multikernel: a new OS architecture for scalable multicore systems , 2009, SOSP '09.

[15]  Jeffrey Dean,et al.  Challenges in building large-scale information retrieval systems: invited talk , 2009, WSDM '09.

[16]  Beng Chin Ooi,et al.  Towards elastic transactional cloud storage with range query support , 2010, Proc. VLDB Endow..

[17]  Andrey Kolesnikov,et al.  Load Modeling and Generation for IP-Based Networks: A Unified Approach and Tool Support , 2010, MMB/DFT.

[18]  Adam Silberstein,et al.  Benchmarking cloud serving systems with YCSB , 2010, SoCC '10.

[19]  Joseph M. Hellerstein,et al.  Consistency Analysis in Bloom: a CALM and Collected Approach , 2011, CIDR.

[20]  Michael J. Freedman,et al.  Don't settle for eventual: scalable causal consistency for wide-area storage with COPS , 2011, SOSP.

[21]  Marc Shapiro,et al.  Conflict-Free Replicated Data Types , 2011, SSS.

[22]  Pradeep Dubey,et al.  PALM: Parallel Architecture-Friendly Latch-Free Modifications to B+ Trees on Many-Core Processors , 2011, Proc. VLDB Endow..

[23]  David Maier,et al.  Logic and lattices for distributed programming , 2012, SoCC '12.

[24]  Robert Morris,et al.  Non-scalable locks are dangerous , 2012 .

[25]  Eddie Kohler,et al.  Cache craftiness for fast multicore key-value storage , 2012, EuroSys '12.

[26]  Sudipta Sengupta,et al.  The Bw-Tree: A B-tree for new hardware platforms , 2013, 2013 IEEE 29th International Conference on Data Engineering (ICDE).

[27]  Ali Ghodsi,et al.  Highly Available Transactions: Virtues and Limitations , 2013, Proc. VLDB Endow..

[28]  Ali Ghodsi,et al.  Bolt-on causal consistency , 2013, SIGMOD '13.

[29]  Alexander J. Smola,et al.  Scaling Distributed Machine Learning with the Parameter Server , 2014, OSDI.

[30]  Ali Ghodsi,et al.  Coordination Avoidance in Database Systems , 2014, Proc. VLDB Endow..

[31]  John K. Ousterhout,et al.  In Search of an Understandable Consensus Algorithm , 2014, USENIX ATC.

[32]  Hyeontaek Lim,et al.  MICA: A Holistic Approach to Fast In-Memory Key-Value Storage , 2014, NSDI.

[33]  Daniel J. Abadi,et al.  Rethinking serializable multiversion concurrency control , 2014, Proc. VLDB Endow..

[34]  Daniel J. Abadi,et al.  Design Principles for Scaling Multi-core OLTP Under High Contention , 2015, SIGMOD Conference.

[35]  Reactors: A Case for Predictable, Virtualized OLTP Actor Database Systems , 2017, ArXiv.

[36]  David Maier,et al.  Indexing in an Actor-Oriented Database , 2017, CIDR.

[37]  Daniel J. Abadi,et al.  Latch-free Synchronization in Database Systems: Silver Bullet or Fool's Gold? , 2017, CIDR.

[38]  Joseph M. Hellerstein,et al.  Eliminating Boundaries in Cloud Storage with Anna , 2018, ArXiv.

[39]  Marcos Antonio Vaz Salles,et al.  Reactors: A Case for Predictable, Virtualized Actor Database Systems , 2017, SIGMOD Conference.