Reducing Downtime Due to System Maintenance and Upgrades (Awarded Best Student Paper!)

Patching, upgrading, and maintaining operating system software is a growing management complexity problem that can result in unacceptable system downtime. We introduce AutoPod, a system that enables unscheduled operating system updates while preserving application service availability. AutoPod provides a group of processes and associated users with an isolated machine-independent virtualized environment that is decoupled from the underlying operating system instance. This virtualized environment is integrated with a novel checkpoint-restart mechanism which allows processes to be suspended, resumed, and migrated across operating system kernel versions with different security and maintenance patches. AutoPod incorporates a system status service to determine when operating system patches need to be applied to the current host, then automatically migrates application services to another host to preserve their availability while the current host is updated and rebooted. We have implemented AutoPod on Linux without requiring any application or operating system kernel changes. Our measurements on real world desktop and server applications demonstrate that AutoPod imposes little overhead and provides sub-second suspend and resume times that can be an order of magnitude faster than starting applications after a system reboot. AutoPod enables systems to autonomically stay updated with relevant maintenance and security patches, while ensuring no loss of data and minimizing service disruption.

[1]  Monica S. Lam,et al.  Supporting ubiquitous computing with stateless consoles and computation caches , 2000 .

[2]  Dilma Da Silva,et al.  Providing Dynamic Update in an Operating System , 2005, USENIX Annual Technical Conference, General Track.

[3]  Raphael A. Finkel,et al.  Interprocess Communication in Charlotte , 1987, IEEE Software.

[4]  Eric Rescorla Security Holes . . . Who Cares? , 2003, USENIX Security Symposium.

[5]  Norman C. Hutchinson,et al.  Heterogeneous process migration: the Tui system , 1998 .

[6]  Jerome H. Saltzer,et al.  The protection of information in computer systems , 1975, Proc. IEEE.

[7]  David R. Cheriton,et al.  The V distributed system , 1988, CACM.

[8]  Allen Carroll Developer's handbook , 1975 .

[9]  Marianne Shaw,et al.  Scale and performance in the Denali isolation kernel , 2002, OSDI '02.

[10]  Brian N. Bershad,et al.  Improving the reliability of commodity operating systems , 2005, TOCS.

[11]  V. Rich Personal communication , 1989, Nature.

[12]  Robert N. M. Watson,et al.  Jails: confining the omnipotent root , 2000 .

[13]  Miron Livny,et al.  Managing Checkpoints for Parallel Programs , 1996, JSSPP.

[14]  Chorus Systemes,et al.  Overview of the CHORUS? Distributed Operating Systems , 1991 .

[15]  Kai Li,et al.  Libckpt: Transparent Checkpointing under UNIX , 1995, USENIX.

[16]  C HutchinsonNorman,et al.  Heterogeneous process migration , 1998 .

[17]  Monica S. Lam,et al.  Optimizing the migration of virtual computers , 2002, OPSR.

[18]  George G. Robertson,et al.  Accent: A communication oriented network operating system kernel , 1981, SOSP.

[19]  Yasushi Saito,et al.  Devirtualizable virtual machines enabling general, single-node, online maintenance , 2004, ASPLOS XI.

[20]  Jeffrey O. Kephart,et al.  The Vision of Autonomic Computing , 2003, Computer.

[21]  Fred Douglis,et al.  Transparent process migration: Design alternatives and the sprite implementation , 1991, Softw. Pract. Exp..

[22]  Daniel Price,et al.  Solaris Zones: Operating System Support for Consolidating Commercial Workloads , 2004, LISA.

[23]  Peter Smith,et al.  Heterogeneous process migration: the Tui system , 1998, Softw. Pract. Exp..

[24]  DouglisFred,et al.  Transparent process migration , 1991 .

[25]  Amnon Barak,et al.  MOSIX: an integrated multiprocessor UNIX , 1999 .

[26]  Miron Livny,et al.  Checkpoint and Migration of UNIX Processes in the Condor Distributed Processing System , 1997 .

[27]  Jonathan Walpole,et al.  MPVM: A Migration Transparent Version of PVM , 1995, Comput. Syst..

[28]  Robbert van Renesse,et al.  Amoeba A Distributed Operating System for the 1990 s Sape , 1990 .

[29]  Claude Kaiser,et al.  Overview of the CHORUS ® Distributed Operating Systems , 1991 .