Determination of throughput guarantees for processor-based SmartNICs

Programmable network devices are on the rise with many applications ranging from improved network management to accelerating and offloading parts of distributed systems. Processor-based SmartNICs, match-action-based switches, and FPGA devices offer on-path programmability. Whereas processor-based SmartNICs are much easier and more versatile to program, they have the huge disadvantage that the resulting throughput may vary strongly and is not easily predictable even to the programmer. We want to close this gap by presenting a methodology which, given a SmartNIC program, determines the achievable throughput of this SmartNIC program in terms of achievable packet rate and bit rate. Our approach combines incremental longest path search with SMT checks to establish a lower bound for the slowest satisfiable program path. By analyzing only the slowest program paths, our approach estimates throughput bounds within a few seconds. The evaluation with our prototype on real programs shows that the estimated throughput guarantees are correct with an error of at most 1.7% and provide a tight lower bound for processor- and memory-bottlenecked programs with only 8.5% and 18.2% underestimation.

[1]  Michael K. Chen,et al.  Shangri-La: achieving high performance from compiled network applications while enabling ease of programming , 2005, PLDI '05.

[2]  Wolfgang Kellerer,et al.  Towards Understanding the Performance of P4 Programmable Hardware , 2019, 2019 ACM/IEEE Symposium on Architectures for Networking and Communications Systems (ANCS).

[3]  Theophilus A. Benson,et al.  Zero Downtime Release: Disruption-free Load Balancing of a Multi-Billion User Website , 2020, SIGCOMM.

[4]  Katerina J. Argyraki,et al.  Toward Predictable Performance in Software Packet-Processing Platforms , 2012, NSDI.

[5]  Katerina J. Argyraki,et al.  Automated synthesis of adversarial workloads for network functions , 2018, SIGCOMM.

[6]  Paolo Costa,et al.  Challenging the Stateless Quo of Programmable Switches , 2020, HotNets.

[7]  Marco Chiesa,et al.  What You Need to Know About (Smart) Network Interface Cards , 2021, PAM.

[8]  Matthias Blume,et al.  Taming the IXP network processor , 2003, PLDI.

[9]  Panos Kalnis,et al.  In-Network Computation is a Dumb Idea Whose Time Has Come , 2017, HotNets.

[10]  Katerina J. Argyraki,et al.  Performance Contracts for Software Network Functions , 2019, NSDI.

[11]  Klaus Wehrle,et al.  Optimizing Data Plane Programs for the Network , 2019, NetPL@SIGCOMM.

[12]  Christoforos E. Kozyrakis,et al.  Mind the Gap: A Case for Informed Request Scheduling at the NIC , 2019, HotNets.

[13]  Jan Rüth,et al.  Demystifying the Performance of XDP BPF , 2019, 2019 IEEE Conference on Network Softwarization (NetSoft).

[14]  Martin Duke,et al.  QUIC-LB: Generating Routable QUIC Connection IDs , 2019 .

[15]  Jan Rüth,et al.  SymPerf: Predicting Network Function Performance , 2017, SIGCOMM Posters and Demos.

[16]  Boon Thau Loo,et al.  Flightplan: Dataplane Disaggregation and Placement for P4 Programs , 2021, NSDI.

[17]  Ramesh Govindan,et al.  Meeting SLOs in cross-platform NFV , 2020, CoNEXT.

[18]  Panos Kalnis,et al.  Scaling Distributed Machine Learning with In-Network Aggregation , 2019, NSDI.

[19]  George Candea,et al.  Verifying software network functions with no verification expertise , 2019, SOSP.

[20]  Andrew W. Moore,et al.  Understanding PCIe performance for end host networking , 2018, SIGCOMM.

[21]  Nicolaas Viljoen,et al.  Hardware Offload to SmartNICs : cls bpf and XDP , 2016 .

[22]  D. W. Davies,et al.  A digital communication network for computers giving rapid response at remote terminals , 1967, SOSP.

[23]  P 4 Data Plane Programming for Server-Based Networking Applications , 2018 .

[24]  Jakob Engblom,et al.  The worst-case execution-time problem—overview of methods and survey of tools , 2008, TECS.

[25]  Dawson R. Engler,et al.  KLEE: Unassisted and Automatic Generation of High-Coverage Tests for Complex Systems Programs , 2008, OSDI.

[26]  Giuseppe Bianchi,et al.  hXDP , 2020, OSDI.

[27]  Peter M. Athanas,et al.  p4pktgen: Automated Test Case Generation for P4 Programs , 2018, SOSR.

[28]  Camil Demetrescu,et al.  Memory models in symbolic execution: key ideas and new thoughts , 2019, Softw. Test. Verification Reliab..

[29]  Graham Cormode,et al.  An improved data stream summary: the count-min sketch and its applications , 2004, J. Algorithms.

[30]  Tao Wang,et al.  Gauntlet: Finding Bugs in Compilers for Programmable Packet Processing , 2020, OSDI.

[31]  Nate Foster,et al.  NetCache: Balancing Key-Value Stores with Fast In-Network Caching , 2017, SOSP.

[32]  Alex C. Snoeren,et al.  SmartNIC Performance Isolation with FairNIC: Programmable Networking for the Cloud , 2020, SIGCOMM.

[33]  Karan Gupta,et al.  Offloading distributed applications onto smartNICs using iPipe , 2019, SIGCOMM.

[34]  Costin Raiciu,et al.  SymNet: Scalable symbolic execution for modern networks , 2016, SIGCOMM.

[35]  Costin Raiciu,et al.  Dataplane equivalence and its applications , 2019, NSDI.

[36]  Emina Torlak,et al.  Specification and verification in the field: Applying formal methods to BPF just-in-time compilers in the Linux kernel , 2020, OSDI.

[37]  Costin Raiciu,et al.  Debugging P4 programs with vera , 2018, SIGCOMM.

[38]  Katerina J. Argyraki,et al.  Software dataplane verification , 2014, NSDI.

[39]  George Varghese,et al.  Forwarding metamorphosis: fast programmable match-action processing in hardware for SDN , 2013, SIGCOMM.

[40]  Fernando Pedone,et al.  NetPaxos: consensus at network speed , 2015, SOSR.

[41]  Tilman Wolf,et al.  Runtime resource allocation in multi-core packet processing systems , 2009, 2009 International Conference on High Performance Switching and Routing.

[42]  G. Bertin XDP in practice: integrating XDP into our DDoS mitigation pipeline , 2017 .

[43]  Long Li,et al.  Automatically partitioning packet processing applications for pipelined architectures , 2005, PLDI '05.

[44]  Toke Høiland-Jørgensen,et al.  The eXpress data path: fast programmable packet processing in the operating system kernel , 2018, CoNEXT.

[45]  Kirill Levchenko,et al.  Uncovering Bugs in P4 Programs with Assertion-based Verification , 2018, SOSR.

[46]  Sandip Kundu An incremental algorithm for identification of longest (shortest) paths , 1994, Integr..

[47]  Michael Menth,et al.  A Survey on Data Plane Programming with P4: Fundamentals, Advances, and Applied Research , 2021, ArXiv.

[48]  Kushagra Vaid,et al.  Azure Accelerated Networking: SmartNICs in the Public Cloud , 2018, NSDI.

[49]  George Candea,et al.  S2E: a platform for in-vivo multi-path analysis of software systems , 2011, ASPLOS XVI.

[50]  Ang Chen,et al.  Clara: Performance Clarity for SmartNIC Offloading , 2020, HotNets.