SwiShmem: Distributed Shared State Abstractions for Programmable Switches

Programmable switches provide an appealing platform for running network functions (NFs), such as NATs, firewalls, and DDoS detectors, entirely in data plane, at staggering multi-Tbps processing rates. However, to be used in real deployments with a complex multi-switch topology, one NF instance must be deployed on each switch, which together act as a single logical NF. This requirement poses significant challenges in particular for stateful NFs, due to the need to manage distributed shared NF state among the switches. While considered a solved problem in classical distributed systems, data-plane state sharing requires addressing several unique challenges: high data rate, limited switch memory, and packet loss. We present the design of SwiShmem, the first distributed shared state management layer for data-plane P4 programs, which facilitates the implementation of stateful distributed NFs on programmable switches. We first analyze the access patterns and consistency requirements of popular NFs that lend themselves for in-switch execution, and then discuss the design and implementation options while highlighting open research questions.

[1]  Martín Casado,et al.  Ethane: taking control of the enterprise , 2007, SIGCOMM '07.

[2]  Pavlin Radoslavov,et al.  ONOS: towards an open, distributed SDN OS , 2014, HotSDN.

[3]  Kirill Kogan,et al.  Robust Distributed Monitoring of Traffic Flows , 2019, 2019 IEEE 27th International Conference on Network Protocols (ICNP).

[4]  Samuel B. Williams,et al.  ASSOCIATION FOR COMPUTING MACHINERY , 2000 .

[5]  Franck Le,et al.  Stateless Network Functions: Breaking the Tight Coupling of State and Processing , 2017, NSDI.

[6]  Marc Shapiro,et al.  Conflict-Free Replicated Data Types , 2011, SSS.

[7]  Minlan Yu,et al.  SilkRoad: Making Stateful Layer-4 Load Balancing Fast and Cheap Using Switching ASICs , 2017, SIGCOMM.

[8]  Robbert van Renesse,et al.  Chain Replication for Supporting High Throughput and Availability , 2004, OSDI.

[9]  Sriram Ramabhadran,et al.  Cloud control with distributed rate limiting , 2007, SIGCOMM '07.

[10]  Mun Choon Chan,et al.  Precise Time-synchronization in the Data-Plane using Programmable Switching ASICs , 2019, SOSR.

[11]  Leslie Lamport,et al.  Time, clocks, and the ordering of events in a distributed system , 1978, CACM.

[12]  Jialin Li,et al.  Just Say NO to Paxos Overhead: Replacing Consensus with Network Ordering , 2016, OSDI.

[13]  Andrew Warfield,et al.  Split/Merge: System Support for Elastic Execution in Virtual Middleboxes , 2013, NSDI.

[14]  Jennifer Rexford,et al.  HULA: Scalable Load Balancing Using Programmable Data Planes , 2016, SOSR.

[15]  Alan L. Cox,et al.  TreadMarks: Distributed Shared Memory on Standard Workstations and Operating Systems , 1994, USENIX Winter.

[16]  Idit Keidar,et al.  Fast Concurrent Data Sketches , 2019, PODC.

[17]  Carlo Contavalli,et al.  Maglev: A Fast and Reliable Software Network Load Balancer , 2016, NSDI.

[18]  Panos Kalnis,et al.  Scaling Distributed Machine Learning with In-Network Aggregation , 2019, NSDI.

[19]  Luciano Paschoal Gaspary,et al.  Offloading Real-time DDoS Attack Detection to Programmable Data Planes , 2019, 2019 IFIP/IEEE Symposium on Integrated Network and Service Management (IM).

[20]  Matthew Broadbent,et al.  P4ID: P4 Enhanced Intrusion Detection , 2019, 2019 IEEE Conference on Network Function Virtualization and Software Defined Networks (NFV-SDN).

[21]  Scott Shenker,et al.  Rollback-Recovery for Middleboxes , 2015, Comput. Commun. Rev..

[22]  Paul Hudak,et al.  Memory coherence in shared virtual memory systems , 1989, TOCS.

[23]  Aditya Akella,et al.  Paving the Way for NFV: Simplifying Middlebox Modifications Using StateAlyzr , 2016, NSDI.

[24]  Arpit Gupta,et al.  Network-Wide Heavy Hitter Detection with Commodity Switches , 2018, SOSR.

[25]  Mark Handley,et al.  Improving datacenter performance and robustness with multipath TCP , 2011, SIGCOMM.

[26]  Mark Handley,et al.  TCP Extensions for Multipath Operation with Multiple Addresses , 2020, RFC.

[27]  Scott Shenker,et al.  Elastic Scaling of Stateful Network Functions , 2018, NSDI.

[28]  Xiaozhou Li,et al.  NetChain: Scale-Free Sub-RTT Coordination , 2018, NSDI.

[29]  Michael J. Freedman,et al.  Object Storage on CRAQ: High-Throughput Chain Replication for Read-Mostly Workloads , 2009, USENIX Annual Technical Conference.

[30]  Maurice Herlihy,et al.  Linearizability: a correctness condition for concurrent objects , 1990, TOPL.

[31]  Kuo-Feng Hsu,et al.  Contra: A Programmable System for Performance-aware Routing , 2019, NSDI.

[32]  Fernando Pedone,et al.  P4xos: Consensus as a Network Service , 2020, IEEE/ACM Transactions on Networking.

[33]  Minlan Yu,et al.  Cheetah: Accelerating Database Queries with Switch Pruning , 2019, SIGCOMM Posters and Demos.

[34]  Minlan Yu,et al.  Software Defined Traffic Measurement with OpenSketch , 2013, NSDI.

[35]  Nate Foster,et al.  NetCache: Balancing Key-Value Stores with Fast In-Network Caching , 2017, SOSP.

[36]  Nick McKeown,et al.  OpenFlow: enabling innovation in campus networks , 2008, CCRV.

[37]  Hani Jamjoom,et al.  Pico replication: a high availability framework for middleboxes , 2013, SoCC.

[38]  Walter Willinger,et al.  Sonata: query-driven streaming network telemetry , 2018, SIGCOMM.

[39]  Torsten Hoefler,et al.  Adaptive Routing Strategies for Modern High Performance Networks , 2008, 2008 16th IEEE Symposium on High Performance Interconnects.

[40]  Aditya Akella,et al.  OpenNF , 2014, SIGCOMM.

[41]  Martín Casado,et al.  Onix: A Distributed Control Platform for Large-scale Production Networks , 2010, OSDI.

[42]  Xiaozhou Li,et al.  DistCache: Provable Load Balancing for Large-Scale Storage Systems with Distributed Caching , 2019, FAST.

[43]  Mark Handley,et al.  TCP Extensions for Multipath Operation with Multiple Addresses , 2011 .

[44]  George Varghese,et al.  Forwarding metamorphosis: fast programmable match-action processing in hardware for SDN , 2013, SIGCOMM.

[45]  Jia Wang,et al.  Scalable flow-based networking with DIFANE , 2010, SIGCOMM '10.

[46]  Leslie Lamport,et al.  The part-time parliament , 1998, TOCS.

[47]  Ratul Mahajan,et al.  On consistent updates in software defined networks , 2013, HotNets.

[48]  Albert G. Greenberg,et al.  Ananta: cloud scale load balancing , 2013, SIGCOMM.