Application-Agnostic Offloading of Packet Processing

As network speed increases, servers struggle to serve all requests directed at them. This challenge is rooted in a partitioned data path where the split between the kernel space networking stack and user space applications induces overheads. To address this challenge, we propose Santa, a new architecture to optimize the data path by enabling server applications to partially offload packet processing to a generic rule processor. We exemplify Santa by showing how it can drastically accelerate kernel-based packet processing - a currently neglected domain. Our evaluation of a broad class of applications, namely DNS, Memcached, and HTTP, highlights that Santa can substantially improve the server performance by a factor of 5.5, 2.1, and 2.5, respectively.

[1]  Mark Silberstein,et al.  GPUnet , 2014, OSDI.

[2]  Will Reese,et al.  Nginx: the high-performance web server and reverse proxy , 2008 .

[3]  Timothy Roscoe,et al.  Arrakis , 2014, OSDI.

[4]  Donald Eastlake,et al.  The FNV Non-Cryptographic Hash Algorithm , 2019 .

[5]  Anja Feldmann,et al.  Back-Office Web Traffic on The Internet , 2014, Internet Measurement Conference.

[6]  Pablo Rodriguez,et al.  Multi-Context TLS (mcTLS): Enabling Secure In-Network Functionality in TLS , 2015, Comput. Commun. Rev..

[7]  Mark Handley,et al.  Network stack specialization for performance , 2015, SIGCOMM 2015.

[8]  Moshe Bar Kernel Korner: kHTTPd, a Kernel-Based Web Server , 2000 .

[9]  Sotiris Ioannidis,et al.  GASPP: A GPU-Accelerated Stateful Packet Processing Framework , 2014, USENIX Annual Technical Conference.

[10]  Werner Vogels,et al.  Dynamo: amazon's highly available key-value store , 2007, SOSP.

[11]  Mendel Rosenblum,et al.  Network Interface Design for Low Latency Request-Response Protocols , 2013, USENIX ATC.

[12]  Hakim Weatherspoon,et al.  NetSlices: Scalable multi-core packet processing in user-space , 2012, 2012 ACM/IEEE Symposium on Architectures for Networking and Communications Systems (ANCS).

[13]  Tim Dierks,et al.  The Transport Layer Security (TLS) Protocol Version 1.2 , 2008 .

[14]  Giuseppe Lettieri,et al.  VALE, a switched ethernet for virtual machines , 2012, CoNEXT '12.

[15]  Tony Tung,et al.  Scaling Memcache at Facebook , 2013, NSDI.

[16]  Luigi Rizzo,et al.  netmap: A Novel Framework for Fast Packet I/O , 2012, USENIX ATC.

[17]  Jörg Ott,et al.  Poor man's content centric networking (with TCP) , 2011 .

[18]  Steven McCanne,et al.  The BSD Packet Filter: A New Architecture for User-level Packet Capture , 1993, USENIX Winter.

[19]  Yan Grunenberger,et al.  The Cost of the "S" in HTTPS , 2014, CoNEXT.

[20]  Michio Honda,et al.  StackMap: Low-Latency Networking with the OS Stack and Dedicated NICs , 2016, USENIX Annual Technical Conference.

[21]  Jeffrey C. Mogul,et al.  The packer filter: an efficient mechanism for user-level network code , 1987, SOSP '87.

[22]  Khaled Elmeleegy,et al.  Overclocking the Yahoo!: CDN for faster web page loads , 2011, IMC '11.

[23]  Jure Petrovic,et al.  Using Memcached for Data Distribution in Industrial Environment , 2008, Third International Conference on Systems (icons 2008).

[24]  Vivek S. Pai,et al.  ModNet: A Modular Approach to Network Stack Extension , 2015, NSDI.

[25]  Byung-Gon Chun,et al.  Usenix Association 10th Usenix Symposium on Operating Systems Design and Implementation (osdi '12) 135 Megapipe: a New Programming Interface for Scalable Network I/o , 2022 .

[26]  Song Jiang,et al.  Workload analysis of a large-scale key-value store , 2012, SIGMETRICS '12.

[27]  Sangjin Han,et al.  PacketShader: a GPU-accelerated software router , 2010, SIGCOMM '10.

[28]  Brad Fitzpatrick,et al.  Distributed caching with memcached , 2004 .

[29]  Sylvia Ratnasamy,et al.  BlindBox: Deep Packet Inspection over Encrypted Traffic , 2015, SIGCOMM.

[30]  Anja Feldmann,et al.  On dominant characteristics of residential broadband internet traffic , 2009, IMC '09.

[31]  Yuchung Cheng,et al.  TCP fast open , 2011, CoNEXT '11.

[32]  Christoforos E. Kozyrakis,et al.  IX: A Protected Dataplane Operating System for High Throughput and Low Latency , 2014, OSDI.

[33]  Eunyoung Jeong,et al.  mTCP: a Highly Scalable User-level TCP Stack for Multicore Systems , 2014, NSDI.

[34]  Costin Raiciu,et al.  Rekindling network protocol innovation with user-level stacks , 2014, CCRV.

[35]  Chuck Lever,et al.  An analysis of the TUX web server , 2000 .

[36]  Jan Rüth,et al.  Application-Agnostic Offloading of Datagram Processing , 2018, 2018 30th International Teletraffic Congress (ITC 30).

[37]  Anja Feldmann,et al.  Distilling the Internet's Application Mix from Packet-Sampled Traffic , 2015, PAM.

[38]  J. Alex Halderman,et al.  Analysis of the HTTPS certificate ecosystem , 2013, Internet Measurement Conference.

[39]  Christos Gkantsidis,et al.  Enabling End-Host Network Functions , 2015, Comput. Commun. Rev..

[40]  Keir Fraser,et al.  Arsenic: a user-accessible gigabit Ethernet interface , 2001, Proceedings IEEE INFOCOM 2001. Conference on Computer Communications. Twentieth Annual Joint Conference of the IEEE Computer and Communications Society (Cat. No.01CH37213).