Rollback-Recovery for Middleboxes

Network middleboxes must offer high availability, with automatic failover when a device fails. Achieving high availability is challenging because failover must correctly restore lost state (e.g., activity logs, port mappings) but must do so quickly (e.g., in less than typical transport timeout values to minimize disruption to applications) and with little overhead to failure-free operation (e.g., additional per-packet latencies of 10-100s of us). No existing middlebox design provides failover that is correct, fast to recover, and imposes little increased latency on failure-free operations. We present a new design for fault-tolerance in middleboxes that achieves these three goals. Our system, FTMB (for Fault-Tolerant MiddleBox), adopts the classical approach of "rollback recovery" in which a system uses information logged during normal operation to correctly reconstruct state after a failure. However, traditional rollback recovery cannot maintain high throughput given the frequent output rate of middleboxes. Hence, we design a novel solution to record middlebox state which relies on two mechanisms: (1) 'ordered logging', which provides lightweight logging of the information needed after recovery, and (2) a `parallel release' algorithm which, when coupled with ordered logging, ensures that recovery is always correct. We implement ordered logging and parallel release in Click and show that for our test applications our design adds only 30$\mu$s of latency to median per packet latencies. Our system introduces moderate throughput overheads (5-30%) and can reconstruct lost state in 40-275ms for practical systems.

[1]  Minlan Yu,et al.  Enforcing Network-Wide Policies in the Presence of Dynamic Middlebox Actions using FlowTags , 2014, NSDI.

[2]  Rusty Russell,et al.  virtio: towards a de-facto standard for virtual I/O devices , 2008, OPSR.

[3]  Scott Shenker,et al.  Recursively Cautious Congestion Control , 2014, NSDI.

[4]  Jason Nieh,et al.  Transparent, lightweight application execution replay on commodity multiprocessor operating systems , 2010, SIGMETRICS '10.

[5]  Mendel Rosenblum,et al.  Network Interface Design for Low Latency Request-Response Protocols , 2013, USENIX Annual Technical Conference.

[6]  Michael Burrows,et al.  Eraser: a dynamic data race detector for multi-threaded programs , 1997, TOCS.

[7]  Dawson R. Engler,et al.  KLEE: Unassisted and Automatic Generation of High-Coverage Tests for Complex Systems Programs , 2008, OSDI.

[8]  Junfeng Yang,et al.  Parrot: a practical runtime for deterministic, stable, and reliable threads , 2013, SOSP.

[9]  Remzi H. Arpaci-Dusseau Operating Systems: Three Easy Pieces , 2015, login Usenix Mag..

[10]  Ion Stoica,et al.  ODR: output-deterministic replay for multicore debugging , 2009, SOSP '09.

[11]  Giuseppe Lettieri,et al.  VALE, a switched ethernet for virtual machines , 2012, CoNEXT '12.

[12]  Aditya Akella,et al.  OpenNF: enabling innovation in network function control , 2015, SIGCOMM 2015.

[13]  Luigi Rizzo,et al.  netmap: A Novel Framework for Fast Packet I/O , 2012, USENIX Annual Technical Conference.

[14]  Giuseppe Lettieri,et al.  Speeding up packet I/O in virtual machines , 2013, Architectures for Networking and Communications Systems.

[15]  Willy Zwaenepoel,et al.  Manetho: Transparent Rollback-Recovery with Low Overhead, Limited Rollback, and Fast Output Commit , 1992, IEEE Trans. Computers.

[16]  George Varghese,et al.  EndRE: An End-System Redundancy Elimination Service for Enterprises , 2010, NSDI.

[17]  Fred B. Schneider,et al.  Hypervisor-based fault tolerance , 1996, TOCS.

[18]  Michael Chow,et al.  Eidetic Systems , 2014, OSDI.

[19]  Aditya Akella,et al.  OpenNF , 2014, SIGCOMM.

[20]  Jacob R. Lorch,et al.  Tardigrade: Leveraging Lightweight Virtual Machines to Easily and Efficiently Construct Fault-Tolerant Services , 2015, NSDI.

[21]  Vyas Sekar,et al.  Making middleboxes someone else's problem: network processing as a cloud service , 2012, SIGCOMM '12.

[22]  David K. Chiabi European Telecommunications Standards Institute , 2015 .

[23]  Andrew Warfield,et al.  Split/Merge: System Support for Elastic Execution in Virtual Middleboxes , 2013, NSDI.

[24]  Robert E. Strom,et al.  Optimistic recovery in distributed systems , 1985, TOCS.

[25]  Satish Narayanasamy,et al.  DoublePlay: parallelizing sequential logging and replay , 2011, ASPLOS XVI.

[26]  Navendu Jain,et al.  Demystifying the dark side of the middle: a field study of middlebox failures in datacenters , 2013, Internet Measurement Conference.

[27]  Minlan Yu,et al.  SIMPLE-fying middlebox policy enforcement using SDN , 2013, SIGCOMM.

[28]  Leslie Lamport,et al.  Time, clocks, and the ordering of events in a distributed system , 1978, CACM.

[29]  Kunle Olukotun,et al.  Improving software concurrency with hardware-assisted memory snapshot , 2008, SPAA '08.

[30]  Vyas Sekar,et al.  Design and Implementation of a Consolidated Middlebox Architecture , 2012, NSDI.

[31]  Katerina J. Argyraki,et al.  RouteBricks: exploiting parallelism to scale software routers , 2009, SOSP '09.

[32]  Vyas Sekar,et al.  The middlebox manifesto: enabling innovation in middlebox deployment , 2011, HotNets-X.

[33]  L. Alvisi,et al.  A Survey of Rollback-Recovery Protocols , 2002 .

[34]  Arie Shoshani,et al.  System Deadlocks , 1971, CSUR.

[35]  Sangjin Han,et al.  PacketShader: a GPU-accelerated software router , 2010, SIGCOMM '10.

[36]  Peter M. Chen,et al.  Execution replay of multiprocessor virtual machines , 2008, VEE '08.

[37]  Vikram S. Adve,et al.  LLVM: a compilation framework for lifelong program analysis & transformation , 2004, International Symposium on Code Generation and Optimization, 2004. CGO 2004..

[38]  Roberto Bifulco,et al.  ClickOS and the Art of Network Function Virtualization , 2014, NSDI.

[39]  Fred B. Schneider,et al.  Implementing fault-tolerant services using the state machine approach: a tutorial , 1990, CSUR.

[40]  Jian Li,et al.  COLO: COarse-grained LOck-stepping virtual machines for non-stop service , 2013, SoCC.

[41]  Robert Tappan Morris,et al.  Improving network connection locality on multicore systems , 2012, EuroSys '12.

[42]  Brandon Lucia,et al.  DMP: Deterministic Shared-Memory Multiprocessing , 2010, IEEE Micro.

[43]  Minlan Yu,et al.  FlowTags: enforcing network-wide policies in the presence of dynamic middlebox actions , 2013, HotSDN '13.

[44]  Dutch T. Meyer,et al.  Remus: High Availability via Asynchronous Virtual Machine Replication. (Best Paper) , 2008, NSDI.

[45]  Xuezheng Liu,et al.  Usenix Association 8th Usenix Symposium on Operating Systems Design and Implementation R2: an Application-level Kernel for Record and Replay , 2022 .

[46]  Hani Jamjoom,et al.  Pico replication: a high availability framework for middleboxes , 2013, SoCC.

[47]  Ming Zhang,et al.  An untold story of middleboxes in cellular networks , 2011, SIGCOMM.

[48]  Ron Kohavi,et al.  Online Experiments: Lessons Learned , 2007, Computer.

[49]  Michael Burrows,et al.  Eraser: a dynamic data race detector for multithreaded programs , 1997, TOCS.