Automatic software upgrades for distributed systems

Upgrading the software of long-lived, highly-available distributed systems is difficult. It is not possible to upgrade all the nodes in a system at once, since some nodes may be unavailable and halting the system for an upgrade is unacceptable. Instead, upgrades may happen gradually, and there may be long periods of time when different nodes are running different software versions and need to communicate using incompatible protocols. We present a methodology and infrastructure that address these challenges and make it possible to upgrade distributed systems automatically while limiting service disruption. Our methodology defines how to enable nodes to interoperate across versions, how to preserve the state of a system across upgrades, and how to schedule an upgrade so as to limit service disruption. The approach is modular: defining an upgrade requires understanding only the new software and the version it replaces. The upgrade infrastructure is a generic platform for distributing and installing software while enabling nodes to interoperate across versions. The infrastructure requires no access to the system source code and is transparent: node software is unaware that different versions even exist. We have implemented a prototype of the infrastructure called Upstart that intercepts socket communication using a dynamically-linked C++ library. Experiments show that Upstart has low overhead and works well for both local-area and Internet systems. (Copies available exclusively from MIT Libraries, Rm. 14-0551, Cambridge, MA 02139-4307. Ph. 617-253-5668; Fax 617-253-1690.)

[1]  Barbara Lerner,et al.  A model for compound type changes encountered in schema evolution , 2000, TODS.

[2]  Deepak Gupta,et al.  On‐line software version change using state transfer between processes , 1993, Softw. Pract. Exp..

[3]  David A. Patterson,et al.  Rewind, repair, replay: three R's to dependability , 2002, EW 10.

[4]  L. A. Dobbs,et al.  Secure software distribution system , 1997 .

[5]  Stephen Gilmore,et al.  Dynamic ML without dynamic types , 1997 .

[6]  David Mazières,et al.  Fast and secure distributed read-only file system , 2000, TOCS.

[7]  Michel Riveill,et al.  Distributed application configuration , 1996, Proceedings of 16th International Conference on Distributed Computing Systems.

[8]  Peter Reichl,et al.  How to Enhance Service Selection in Distributed Systems , 1996 .

[9]  Earl T. Barr,et al.  Runtime Support for Type-Safe Dynamic Java Classes , 2000, ECOOP.

[10]  Miguel Castro,et al.  Proactive recovery in a Byzantine-fault-tolerant system , 2000, OSDI.

[11]  Toby Bloom,et al.  Dynamic module replacement in a distributed programming system , 1983 .

[12]  Scott Nettles,et al.  Dynamic software updating , 2001, PLDI '01.

[13]  Stephen McCamant,et al.  Predicting problems caused by component upgrades , 2003, ESEC/FSE-11.

[14]  T. Senivongse,et al.  A model for evolution of services in distributed systems , 1996, Proceedings of IFIP/IEEE International Conference on Distributed Platforms.

[15]  Stanley B. Zdonik,et al.  The management of changing types in an object-oriented database , 1986, OOPLSA '86.

[16]  David Mazières,et al.  Democratizing Content Publication with Coral , 2004, NSDI.

[17]  David R. Karger,et al.  Wide-area cooperative storage with CFS , 2001, SOSP.

[18]  Robert Morris,et al.  Chord: A scalable peer-to-peer lookup service for internet applications , 2001, SIGCOMM 2001.

[19]  P. M. Melliar-Smith,et al.  Live upgrades of CORBA applications using object replication , 2001, Proceedings IEEE International Conference on Software Maintenance. ICSM 2001.

[20]  Ralph Johnson,et al.  design patterns elements of reusable object oriented software , 2019 .

[21]  Fabrizio Ferrandina,et al.  Simulation of Schema Change using Views , 1995, DEXA.

[22]  Miguel Castro,et al.  Providing Persistent Objects in Distributed Systems , 1999, ECOOP.

[23]  Toby Bloom,et al.  Reconfiguration in Argus , 1992, CDS.

[24]  Jeannette M. Wing,et al.  A behavioral notion of subtyping , 1994, TOPL.

[25]  Twittie Senivongse,et al.  A model for evolution of services in distributed systems , 1996 .

[26]  Hamid Pirahesh,et al.  ARIES: a transaction recovery method supporting fine-granularity locking and partial rollbacks using write-ahead logging , 1998 .

[27]  Dale Skeen,et al.  The Information Bus: an architecture for extensible distributed systems , 1994, SOSP '93.

[28]  David R. Karger,et al.  Chord: A scalable peer-to-peer lookup service for internet applications , 2001, SIGCOMM '01.

[29]  Chang Liu,et al.  Using RAIC for dependable on-line upgrading of distributed systems , 2002, Proceedings 26th Annual International Computer Software and Applications.

[30]  Dilma Da Silva,et al.  System Support for Online Reconfiguration , 2003, USENIX Annual Technical Conference, General Track.

[31]  Jonathan E. Cook,et al.  Highly reliable upgrading of components , 1999, Proceedings of the 1999 International Conference on Software Engineering (IEEE Cat. No.99CB37002).

[32]  Sergey Brin,et al.  The Anatomy of a Large-Scale Hypertextual Web Search Engine , 1998, Comput. Networks.

[33]  Tobias Ritzau,et al.  Dynamic Deployment of Java Applications , 2000 .

[34]  Paul J. Leach,et al.  An HTTP Extension Framework , 2000, RFC.

[35]  David Mazières,et al.  Separating key management from file system security , 1999, SOSP.

[36]  Twittie Senivongse Enabling flexible cross-version interoperability for distributed services , 1999, Proceedings of the International Symposium on Distributed Objects and Applications.

[37]  D. H. Crocker,et al.  Standard for the format of arpa intemet text messages , 1982 .

[38]  Deborah Estrin,et al.  A Framework for Active Distributed Services , 1997 .

[39]  Frank E. Redmond Dcom: Microsoft Distributed Component Object Model , 1997 .

[40]  Jeff Magee,et al.  The Evolving Philosophers Problem: Dynamic Change Management , 1990, IEEE Trans. Software Eng..

[41]  Eric A. Brewer,et al.  Lessons from Giant-Scale Services , 2001, IEEE Internet Comput..

[42]  David Mazières,et al.  Decentralized user authentication in a global file system , 2003, SOSP '03.

[43]  Joel E. Richardson,et al.  Aspects: extending objects to support multiple, independent roles , 1991, SIGMOD '91.

[44]  Ian Sommerville,et al.  A Model for Versioning of Classes in Object-Oriented Databases , 1992, BNCOD.

[45]  Hari Balakrishnan,et al.  TESLA: A Transparent, Extensible Session-Layer Architecture for End-to-end Network Services , 2003, USENIX Symposium on Internet Technologies and Systems.

[46]  Mario Barbacci,et al.  Building fault tolerant distributed applications with Durra , 1992, CDS.

[47]  Liuba Shrira,et al.  Lazy modular upgrades in persistent object stores , 2003, OOPSLA.

[48]  Marten van Sinderen,et al.  Transparent dynamic reconfiguration for CORBA , 2001, Proceedings 3rd International Symposium on Distributed Objects and Applications.

[49]  Peyman Oreizy,et al.  Architecture-based runtime software evolution , 1998, Proceedings of the 20th International Conference on Software Engineering.

[50]  Robert Wrembel Object-Oriented Views: Virtues and Limitations , 1998 .

[51]  Maurice Herlihy,et al.  A Value Transmission Method for Abstract Data Types , 1982, TOPL.

[52]  Valérie Issarny,et al.  A dynamic reconfiguration service for CORBA , 1998, Proceedings. Fourth International Conference on Configurable Distributed Systems (Cat. No.98EX159).

[53]  C. Popien,et al.  Enabling interworking between heterogeneous distributed platforms , 1996, Proceedings of IFIP/IEEE International Conference on Distributed Platforms.

[54]  Brent Callaghan,et al.  NFS Version 3 Protocol Specification , 1995, RFC.

[55]  Robert S. Fabry,et al.  How to design a system in which modules can be changed on the fly , 1976, ICSE '76.

[56]  Sameer Ajmani Distributed System Upgrade Scenarios , 2002 .

[57]  Daniel P. Schrage,et al.  An open platform for reconfigurable control , 2001 .

[58]  Roger M. Needham,et al.  On the duality of operating system structures , 1979, OPSR.

[59]  Premkumar T. Devanbu,et al.  Security for Automated, Distributed Configuration Management , 1999 .

[60]  Barbara Liskov Software upgrades in distributed systems , 2001 .

[61]  John H. Howard,et al.  On Overview of the Andrew File System , 1988, USENIX Winter.

[62]  Chryssa Dislis Improving service availability via low-outage upgrades , 2002, Proceedings 26th Annual International Computer Software and Applications.

[63]  Lui Sha,et al.  Evolving dependable real-time systems , 1996, 1996 IEEE Aerospace Applications Conference. Proceedings.

[64]  David E. Culler,et al.  A blueprint for introducing disruptive technology into the Internet , 2003, CCRV.

[65]  Michael Stonebraker,et al.  The design of POSTGRES , 1986, SIGMOD '86.

[66]  James M. Purtilo,et al.  A framework for dynamic reconfiguration of distributed programs , 1993 .

[67]  Sihem Amer-Yahia,et al.  Object Views and Updates , 1996, BDA.

[68]  Richard S. Hall,et al.  An architecture for post-development configuration management in a wide-area network , 1997, Proceedings of 17th International Conference on Distributed Computing Systems.

[69]  Bernd Meyer,et al.  Enabling interworking between heterogeneous distributed platforms , 1996 .

[70]  Huw Evans,et al.  DRASTIC: A Run-Time Architecture for Evolving, Distributed, Persistent Systems , 1997, ECOOP.

[71]  Paul Mackerras,et al.  The rsync algorithm , 1996 .

[72]  Michael E. Shaddock,et al.  How to Upgrade 1500 Workstations on Saturday, and Still Have Time to Mow the Yard on Sunday , 1995, LISA.

[73]  Robert Gray,et al.  Dynamic C++ Classes - A Lightweight Mechanism to Update Code in a Running Program , 1998, USENIX Annual Technical Conference.

[74]  Liuba Shrira,et al.  Scheduling and Simulation: How to Upgrade Distributed Systems , 2003, HotOS.

[75]  Mendel Rosenblum,et al.  The design and implementation of a log-structured file system , 1991, SOSP '91.

[76]  Armando Fox,et al.  Session State: Beyond Soft State , 2004, NSDI.

[77]  Ophir Frieder,et al.  On dynamically updating a computer program: From concept to prototype , 1991, J. Syst. Softw..

[78]  D. Kapur TOWARDS A THEORY FOR ABSTRACT DATA TYPES , 1980 .