Sailfish: accelerating cloud-scale multi-tenant multi-service gateways with programmable switches

The cloud gateway is essential in the public cloud as the central hub of cloud traffic. We show that horizontal scaling of software gateways, once sustainable for years, is no longer future-proof facing the massive scale and rapid growth of today's cloud. The root cause is the stagnant performance of the CPU core, which is prone to be overloaded by heavy hitters as traffic growth goes far beyond Moore's law. To address this, we propose \emph{Sailfish}, a cloud-scale multi-tenant multi-service gateway accelerated by programmable switches. The new challenge is that large forwarding tables due to multi-tenancy cannot be fit into the limited on-chip memories. To this end, we devise a multi-pronged approach with (1) hardware/software co-design for table sharing, (2) horizontal table splitting among gateway clusters, (3) pipeline-aware table compression for a single node. Compared with the x86 gateway of a similar price, Sailfish reduces latency by 95% (2μs), improves throughput by more than 20x in bps (3.2Tbps) and 71x in pps (1.8Gpps) with packet length < 256B. Sailfish has been deployed in Alibaba Cloud for more than two years. It is the first P4-based cloud gateway in the industry, of which a single cluster carries dozens of Tbps traffic, withstanding peak-hour traffic in large online shopping festivals.

[1]  S. Muthukrishnan,et al.  Heavy-Hitter Detection Entirely in the Data Plane , 2016, SOSR.

[2]  Prashant J. Shenoy,et al.  The Case for Enterprise-Ready Virtual Private Clouds , 2009, HotCloud.

[3]  Divyakant Agrawal,et al.  ElasTraS: An elastic, scalable, and self-managing transactional database for the cloud , 2013, TODS.

[4]  Laurent Mathy,et al.  Fast userspace packet processing , 2015, 2015 ACM/IEEE Symposium on Architectures for Networking and Communications Systems (ANCS).

[5]  Lawrence G. Roberts,et al.  Beyond Moore's Law: Internet Growth Trends , 2000, Computer.

[6]  Ryan Beckett,et al.  Aragog: Scalable Runtime Verification of Shardable Networked Systems , 2020, OSDI.

[7]  Christian E. Hopps,et al.  Analysis of an Equal-Cost Multi-Path Algorithm , 2000, RFC.

[9]  Lei Wang,et al.  VTrace: Automatic Diagnostic System for Persistent Packet Loss in Cloud-Scale Overlay Network , 2020, SIGCOMM.

[10]  Neil C. Thompson,et al.  The decline of computers as a general purpose technology , 2021, Commun. ACM.

[11]  George Varghese,et al.  Forwarding metamorphosis: fast programmable match-action processing in hardware for SDN , 2013, SIGCOMM.

[12]  Peng Zheng,et al.  A Closer Look at NFV Execution Models , 2019, APNet.

[13]  Ming Zhang,et al.  Duet: cloud scale load balancing with hardware and software , 2015, SIGCOMM.

[14]  Babak Falsafi,et al.  Clearing the clouds: a study of emerging scale-out workloads on modern hardware , 2012, ASPLOS XVII.

[15]  Mark Handley,et al.  Network stack specialization for performance , 2013, HotNets.

[16]  George Varghese,et al.  P4: programming protocol-independent packet processors , 2013, CCRV.

[17]  Xiaowei Yang,et al.  CloudCmp: comparing public cloud providers , 2010, IMC '10.

[18]  Jennifer Rexford,et al.  A Scalable VPN Gateway for Multi-Tenant Cloud Services , 2018, CCRV.

[19]  Albert G. Greenberg,et al.  Ananta: cloud scale load balancing , 2013, SIGCOMM.

[20]  F. Richard Yu,et al.  Fast Switch-Based Load Balancer Considering Application Server States , 2020, IEEE/ACM Transactions on Networking.

[21]  Chengchen Hu,et al.  Adaptable Switch: A Heterogeneous Switch Architecture for Network-Centric Computing , 2020, IEEE Communications Magazine.

[22]  Gary Garrison,et al.  Success factors for deploying cloud computing , 2012, CACM.

[23]  Srinivasan Seshan,et al.  TEA: Enabling State-Intensive Network Functions on Programmable Switches , 2020, SIGCOMM.

[24]  Martin Raab,et al.  "Balls into Bins" - A Simple and Tight Analysis , 1998, RANDOM.

[25]  Martín Casado,et al.  Network Virtualization in Multi-tenant Datacenters , 2014, NSDI.

[26]  Bin Liu,et al.  A practical packet reordering mechanism with flow granularity for parallelism exploiting in network processors , 2005, 19th IEEE International Parallel and Distributed Processing Symposium.

[27]  Ioannis Tomkos,et al.  Cost and Power Consumption Comparison of 400 Gbps Intra-Datacenter Transceiver Modules , 2018, 2018 20th International Conference on Transparent Optical Networks (ICTON).

[28]  Lawrence Kreeger,et al.  Virtual eXtensible Local Area Network (VXLAN): A Framework for Overlaying Virtualized Layer 2 Networks over Layer 3 Networks , 2014, RFC.

[29]  Jun Bi,et al.  Tripod: Towards a Scalable, Efficient and Resilient Cloud Gateway , 2019, IEEE Journal on Selected Areas in Communications.

[30]  Minlan Yu,et al.  SilkRoad: Making Stateful Layer-4 Load Balancing Fast and Cheap Using Switching ASICs , 2017, SIGCOMM.

[31]  Xiaohui Gu,et al.  CloudScale: elastic resource scaling for multi-tenant cloud systems , 2011, SoCC.

[32]  Carlo Contavalli,et al.  Maglev: A Fast and Reliable Software Network Load Balancer , 2016, NSDI.

[33]  Kushagra Vaid,et al.  Azure Accelerated Networking: SmartNICs in the Public Cloud , 2018, NSDI.

[34]  Yongqiang Xiong,et al.  Protego: Cloud-Scale Multitenant IPsec Gateway , 2017, USENIX Annual Technical Conference.