Remus: High Availability via Asynchronous Virtual Machine Replication. (Best Paper)

Allowing applications to survive hardware failure is an expensive undertaking, which generally involves reengineering software to include complicated recovery logic as well as deploying special-purpose hardware; this represents a severe barrier to improving the dependability of large or legacy applications. We describe the construction of a general and transparent high availability service that allows existing, unmodified software to be protected from the failure of the physical machine on which it runs. Remus provides an extremely high degree of fault tolerance, to the point that a running system can transparently continue execution on an alternate physical host in the face of failure with only seconds of downtime, while completely preserving host state such as active network connections. Our approach encapsulates protected software in a virtual machine, asynchronously propagates changed state to a backup host at frequencies as high as forty times a second, and uses speculative execution to concurrently run the active VM slightly ahead of the replicated system state.

[1]  Philipp Reisner,et al.  Replicated Storage with Shared Disk Semantics , 2007 .

[2]  Gregory R. Ganger,et al.  Automated Disk Drive Characterization , 1999 .

[3]  Qing Yang,et al.  TRAP-Array: A Disk Array Architecture Providing Timely Recovery to Any Point-in-time , 2006, 33rd International Symposium on Computer Architecture (ISCA'06).

[4]  Mahadev Satyanarayanan,et al.  Scale and performance in a distributed file system , 1987, SOSP '87.

[5]  Andrew Warfield,et al.  Parallax: Managing Storage for a Million Machines , 2005, HotOS.

[6]  Robbert van Renesse,et al.  Amoeba A Distributed Operating System for the 1990 s Sape , 1990 .

[7]  Andrew R. Cherenson,et al.  The Sprite network operating system , 1988, Computer.

[9]  Peter M. Chen,et al.  The impact of recovery mechanisms on the likelihood of saving corrupted state , 2002, 13th International Symposium on Software Reliability Engineering, 2002. Proceedings..

[10]  Amin Vahdat,et al.  To infinity and beyond: time warped network emulation , 2005, SOSP '05.

[11]  R. Bodík,et al.  A "flight data recorder" for enabling full-system multiprocessor deterministic replay , 2003, 30th Annual International Symposium on Computer Architecture, 2003. Proceedings..

[12]  Georg Stellner,et al.  CoCheck: checkpointing and process migration for MPI , 1996, Proceedings of International Conference on Parallel Processing.

[13]  Andrew Warfield,et al.  SecondSite: Disaster Protection for the Common Server , 2006, HotDep.

[14]  Daniel Marques,et al.  Optimizing checkpoint sizes in the C3 system , 2005, 19th IEEE International Parallel and Distributed Processing Symposium.

[15]  Jason Flinn,et al.  Speculative execution in a distributed file system , 2005, SOSP '05.

[16]  Amnon Barak,et al.  MOSIX: an integrated multiprocessor UNIX , 1999 .

[17]  Samuel T. King,et al.  ReVirt: enabling intrusion analysis through virtual-machine logging and replay , 2002, OPSR.

[18]  Peter M. Chen,et al.  Execution replay for intrusion analysis , 2006 .

[19]  Samuel T. King,et al.  Debugging Operating Systems with Time-Traveling Virtual Machines (Awarded General Track Best Paper Award!) , 2005, USENIX Annual Technical Conference, General Track.

[20]  Andrew Warfield,et al.  Live migration of virtual machines , 2005, NSDI.

[21]  George G. Robertson,et al.  Accent: A communication oriented network operating system kernel , 1981, SOSP.

[22]  Bill Broyles Notes , 1907, The Classical Review.

[23]  Jason Nieh,et al.  Proceedings of the 5th Symposium on Operating Systems Design and Implementation , 2022 .

[24]  Min Xu,et al.  A "flight data recorder" for enabling full-system multiprocessor deterministic replay , 2003, ISCA '03.

[25]  Dutch T. Meyer,et al.  Parallax: virtual disks for virtual machines , 2008, Eurosys '08.

[26]  Fred B. Schneider,et al.  Hypervisor-based fault tolerance , 1996, TOCS.

[27]  Anja Feldmann,et al.  Live wide-area migration of virtual machines including local persistent state , 2007, VEE '07.

[28]  Jason Flinn,et al.  Rethink the sync , 2006, OSDI '06.