Pico replication: a high availability framework for middleboxes

Middleboxes are being rearchitected to be service oriented, composable, extensible, and elastic. Yet system-level support for high availability (HA) continues to introduce significant performance overhead. In this paper, we propose Pico Replication (PR), a system-level framework for middleboxes that exploits their flow-centric structure to achieve low overhead, fully customizable HA. Unlike generic (virtual machine level) techniques, PR operates at the flow level. Individual flows can be checkpointed at very high frequencies while the middlebox continues to process other flows. Furthermore, each flow can have its own checkpoint frequency, output buffer and target for backup, enabling rich and diverse policies that balance---per-flow---performance and utilization. PR leverages OpenFlow to provide near instant flow-level failure recovery, by dynamically rerouting a flow's packets to its replication target. We have implemented PR and a flow-based HA policy. In controlled experiments, PR sustains checkpoint frequencies of 1000Hz, an order of magnitude improvement over current VM replication solutions. As a result, PR drastically reduces the overhead on end-to-end latency from 280% to 15.5% and throughput overhead from 99.5% to 3.2%.

[1]  Samuel T. King,et al.  ReVirt: enabling intrusion analysis through virtual-machine logging and replay , 2002, OPSR.

[2]  Luigi Rizzo,et al.  netmap: A Novel Framework for Fast Packet I/O , 2012, USENIX Annual Technical Conference.

[3]  Alexander Zimmermann,et al.  Flowgrind - A New Performance Measurement Tool , 2010, 2010 IEEE Global Telecommunications Conference GLOBECOM 2010.

[4]  Acee Lindem,et al.  Virtual Router Redundancy Protocol , 1998, RFC.

[5]  Sylvia Ratnasamy,et al.  A Survey of Enterprise Middlebox Deployments , 2012 .

[6]  Vern Paxson,et al.  Bro: a system for detecting network intruders in real-time , 1998, Comput. Networks.

[7]  Aditya Akella,et al.  Design and implementation of a framework for software-defined middlebox networking , 2013, SIGCOMM.

[8]  Marcos K. Aguilera,et al.  Using the Heartbeat Failure Detector for Quiescent Reliable Communication and Consensus in Partitionable Networks , 1999, Theor. Comput. Sci..

[9]  Andrew Warfield,et al.  SecondSite: disaster tolerance as a service , 2012, VEE '12.

[10]  Satish Narayanasamy,et al.  Respec: Efficient Online Multiprocessor Replay via Speculation and External Determinism , 2010, ASPLOS 2010.

[11]  Marvin Theimer,et al.  Managing update conflicts in Bayou, a weakly connected replicated storage system , 1995, SOSP.

[12]  D. Andersen,et al.  A Fast Array of Wimpy Nodes , 2008 .

[13]  Ganesh Venkitachalam,et al.  The Design and Evaluation of a Practical System for Fault-Tolerant Virtual Machines , 2010 .

[14]  Hideaki Sakai,et al.  IEEE Global Telecommunications Conference (Globecom 2009) , 2009 .

[15]  Fred B. Schneider,et al.  Hypervisor-based fault tolerance , 1996, TOCS.

[16]  Mark Allman,et al.  Web Timeouts and Their Implications , 2010, PAM.

[17]  Robert M. Hinden Virtual Router Redundancy Protocol (VRRP) , 2004, RFC.

[18]  Andrew Warfield,et al.  Live migration of virtual machines , 2005, NSDI.

[19]  Amin Vahdat,et al.  xOMB: Extensible Open MiddleBoxes with commodity servers , 2012, 2012 ACM/IEEE Symposium on Architectures for Networking and Communications Systems (ANCS).

[20]  Robbert van Renesse,et al.  A Gossip-Style Failure Detection Service , 2009 .

[21]  Robert Morris,et al.  Chord: A scalable peer-to-peer lookup service for internet applications , 2001, SIGCOMM 2001.

[22]  Werner Vogels,et al.  Dynamo: amazon's highly available key-value store , 2007, SOSP.

[23]  David R. Karger,et al.  Chord: A scalable peer-to-peer lookup service for internet applications , 2001, SIGCOMM '01.

[24]  Amar Phanishayee,et al.  FAWN: a fast array of wimpy nodes , 2009, SOSP '09.

[25]  Andrew Warfield,et al.  RemusDB: transparent high availability for database systems , 2011, The VLDB Journal.

[26]  Dutch T. Meyer,et al.  Remus: High Availability via Asynchronous Virtual Machine Replication. (Best Paper) , 2008, NSDI.

[27]  Vyas Sekar,et al.  Design and Implementation of a Consolidated Middlebox Architecture , 2012, NSDI.

[28]  Martín Casado,et al.  Extending Networking into the Virtualization Layer , 2009, HotNets.

[29]  David K. Gifford,et al.  Weighted voting for replicated data , 1979, SOSP '79.

[30]  Andrew Warfield,et al.  Split/Merge: System Support for Elastic Execution in Virtual Middleboxes , 2013, NSDI.

[31]  Robert E. Strom,et al.  Optimistic recovery in distributed systems , 1985, TOCS.

[32]  Armando Fox,et al.  Session State: Beyond Soft State , 2004, NSDI.

[33]  Peter M. Chen,et al.  Execution replay of multiprocessor virtual machines , 2008, VEE '08.

[34]  Mark Allman On the performance of middleboxes , 2003, IMC '03.