Extensible distributed coordination

Most services inside a data center are distributed systems requiring coordination and synchronization in the form of primitives like distributed locks and message queues. We argue that extensibility is a crucial feature of the coordination infrastructures used in these systems. Without the ability to extend the functionality of coordination services, applications might end up using sub-optimal coordination algorithms, possibly leading to low performance. Adding extensibility, however, requires mechanisms that constrain extensions to be able to make reasonable security and performance guarantees. We propose a scheme that enables extensions to be introduced and removed dynamically in a secure way. To avoid performance overheads due to poorly designed extensions, it constrains the access of extensions to resources. Evaluation results for extensible versions of ZooKeeper and DepSpace show that it is possible to increase the throughput of a distributed queue by more than an order of magnitude (17x for ZooKeeper, 24x for DepSpace) while keeping the underlying coordination kernel small.

[1]  Miguel Correia,et al.  SCFS: A Shared Cloud-backed File System , 2014, USENIX Annual Technical Conference.

[2]  Ramón Cáceres,et al.  Vis-à-Vis: Privacy-preserving online social networking via Virtual Individual Servers , 2011, 2011 Third International Conference on Communication Systems and Networks (COMSNETS 2011).

[3]  Frank Dabek,et al.  Large-scale Incremental Processing Using Distributed Transactions and Notifications , 2010, OSDI.

[4]  Brian N. Bershad,et al.  Extensibility safety and performance in the SPIN operating system , 1995, SOSP.

[5]  Anand R. Tripathi,et al.  Design issues in mobile agent programming systems , 1998, IEEE Concurr..

[6]  André Schiper,et al.  Addressing the ZooKeeper Synchronization Inefficiency , 2013, ICDCN.

[7]  Anja Feldmann,et al.  Logically centralized?: state distribution trade-offs in software defined networks , 2012, HotSDN '12.

[8]  Pavlin Radoslavov,et al.  ONOS: towards an open, distributed SDN OS , 2014, HotSDN.

[9]  Antony I. T. Rowstron Using mobile code to provide fault tolerance in tuple space based coordination languages , 2003, Sci. Comput. Program..

[10]  Tobias Distler,et al.  Increasing performance in byzantine fault-tolerant systems with on-demand replica consistency , 2011, EuroSys '11.

[11]  David Gelernter,et al.  Generative communication in Linda , 1985, TOPL.

[12]  Maurice Herlihy,et al.  Wait-free synchronization , 1991, TOPL.

[13]  Miguel Correia,et al.  DepSpace: a byzantine fault-tolerant coordination service , 2008, Eurosys '08.

[14]  Alysson Neves Bessani,et al.  State Machine Replication for the Masses with BFT-SMART , 2014, 2014 44th Annual IEEE/IFIP International Conference on Dependable Systems and Networks.

[15]  Marcos K. Aguilera,et al.  Sinfonia: a new paradigm for building scalable distributed systems , 2007, SOSP.

[16]  Robert Griesemer,et al.  Paxos made live: an engineering perspective , 2007, PODC '07.

[17]  Armando Fox,et al.  The Event Heap: a coordination infrastructure for interactive workspaces , 2002, Proceedings Fourth IEEE Workshop on Mobile Computing Systems and Applications.

[18]  Peter Dadam,et al.  Design and Implementation of an Extensible Database Management System Supporting User Defined Data Types and Functions , 1988, VLDB.

[19]  Michael Golm,et al.  The JX Operating System , 2002, USENIX Annual Technical Conference, General Track.

[20]  Richard D. Schlichting,et al.  Supporting Fault-Tolerant Parallel Programming in Linda , 1995, IEEE Trans. Parallel Distributed Syst..

[21]  Leslie Lamport,et al.  Reaching Agreement in the Presence of Faults , 1980, JACM.

[22]  Margo I. Seltzer,et al.  Dealing with disaster: surviving misbehaved kernel extensions , 1996, OSDI '96.

[23]  David K. Gifford,et al.  Remote evaluation , 1990, TOPL.

[24]  Martín Casado,et al.  Onix: A Distributed Control Platform for Large-scale Production Networks , 2010, OSDI.

[25]  Fred B. Schneider,et al.  The primary-backup approach , 1993 .

[26]  Mahadev Konar,et al.  ZooKeeper: Wait-free Coordination for Internet-scale Systems , 2010, USENIX ATC.

[27]  Werner Vogels,et al.  Life is not a state-machine: the long road from research to production , 2006, PODC '06.

[28]  Marcos K. Aguilera,et al.  Detecting failures in distributed systems with the Falcon spy network , 2011, SOSP.

[29]  Peter Scheuermann,et al.  Active Database Systems , 2008, Wiley Encyclopedia of Computer Science and Engineering.

[30]  Michael Stonebraker,et al.  Extending a database system with procedures , 1987, TODS.

[31]  Brett D. Fleisch,et al.  The Chubby lock service for loosely-coupled distributed systems , 2006, OSDI '06.

[32]  Flavio Paiva Junqueira,et al.  Zab: High-performance broadcast for primary-backup systems , 2011, 2011 IEEE/IFIP 41st International Conference on Dependable Systems & Networks (DSN).

[33]  Marc Najork,et al.  Boxwood: Abstractions as the Foundation for Storage Infrastructure , 2004, OSDI.

[34]  Dahlia Malkhi,et al.  CORFU: A distributed shared log , 2013, TOCS.

[35]  Maurice Herlihy,et al.  Linearizability: a correctness condition for concurrent objects , 1990, TOPL.

[36]  Jason Hunter Java servlet programming , 1998, Java series.

[37]  Galen C. Hunt,et al.  Debugging in the (very) large: ten years of implementation and experience , 2009, SOSP '09.

[38]  Michael Isard,et al.  Autopilot: automatic data center management , 2007, OPSR.

[39]  Roberto Baldoni,et al.  CORBA request portable interceptors: analysis and applications , 2003, Concurr. Comput. Pract. Exp..

[40]  Fred B. Schneider,et al.  Implementing fault-tolerant services using the state machine approach: a tutorial , 1990, CSUR.

[41]  Andrea Omicini,et al.  An Extensible Frame work for the Development of Coordinated Applications , 1996, COORDINATION.

[42]  Edward J. Segall Resilient distributed objects: Basic results and application to shared tuple spaces , 1995, Proceedings.Seventh IEEE Symposium on Parallel and Distributed Processing.

[43]  Sangmin Lee,et al.  Upright cluster services , 2009, SOSP '09.

[44]  John K. Ousterhout,et al.  In Search of an Understandable Consensus Algorithm , 2014, USENIX ATC.

[45]  Chen Liang,et al.  Hierarchical policies for software defined networks , 2012, HotSDN '12.

[46]  Franco Zambonelli,et al.  Coordination for Internet Application Development , 1999, Autonomous Agents and Multi-Agent Systems.

[47]  Antony I. T. Rowstron,et al.  Solving the Linda Multiple rd Problem Using the Copy-Collect Primitive , 1998, Sci. Comput. Program..

[48]  Leslie Lamport,et al.  The part-time parliament , 1998, TOCS.

[49]  Dawson R. Engler,et al.  Exokernel: an operating system architecture for application-level resource management , 1995, SOSP.

[50]  Fernando M. V. Ramos,et al.  On the Feasibility of a Consistent and Fault-Tolerant Data Store for SDNs , 2013, 2013 Second European Workshop on Software Defined Networks.

[51]  Franco Zambonelli,et al.  MARS: A Programmable Coordination Architecture for Mobile Agents , 2000, IEEE Internet Comput..

[52]  Paul Butcher,et al.  Global synchronisation in Linda , 1994, Concurr. Pract. Exp..