Application-Agnostic Offloading of Datagram Processing

As network speed increases, servers struggle to serve all requests directed at them. This challenge is rooted in a partitioned data path where the split between the kernel space networking stack and user space applications induces overheads. To address this challenge, we propose Santa, an architecture to optimize the data path by enabling server applications to (partially) offload packet processing to a generic rule processor. We exemplify Santa by showing how it can drastically accelerate UDP packet processing in the Linux kernel—a currently neglected domain. Our evaluation focuses on accelerating DNS traffic for which we find a performance increase by a factor of 5.5 on real-world request pattern.

[1]  Byung-Gon Chun,et al.  Usenix Association 10th Usenix Symposium on Operating Systems Design and Implementation (osdi '12) 135 Megapipe: a New Programming Interface for Scalable Network I/o , 2022 .

[2]  Jeffrey C. Mogul,et al.  The packer filter: an efficient mechanism for user-level network code , 1987, SOSP '87.

[3]  Christoforos E. Kozyrakis,et al.  IX: A Protected Dataplane Operating System for High Throughput and Low Latency , 2014, OSDI.

[4]  Narseo Vallina-Rodriguez,et al.  A Long Way to the Top: Significance, Structure, and Stability of Internet Top Lists , 2018, Internet Measurement Conference.

[5]  Donald Eastlake,et al.  The FNV Non-Cryptographic Hash Algorithm , 2019 .

[6]  Sylvia Ratnasamy,et al.  BlindBox: Deep Packet Inspection over Encrypted Traffic , 2015, SIGCOMM.

[7]  Yan Grunenberger,et al.  The Cost of the "S" in HTTPS , 2014, CoNEXT.

[8]  Christos Gkantsidis,et al.  Enabling End-Host Network Functions , 2015, Comput. Commun. Rev..

[9]  Chuck Lever,et al.  An analysis of the TUX web server , 2000 .

[10]  Timothy Roscoe,et al.  Arrakis , 2014, OSDI.

[11]  Anja Feldmann,et al.  Distilling the Internet's Application Mix from Packet-Sampled Traffic , 2015, PAM.

[12]  Eunyoung Jeong,et al.  mTCP: a Highly Scalable User-level TCP Stack for Multicore Systems , 2014, NSDI.

[13]  J. Alex Halderman,et al.  Analysis of the HTTPS certificate ecosystem , 2013, Internet Measurement Conference.

[14]  Keir Fraser,et al.  Arsenic: a user-accessible gigabit Ethernet interface , 2001, Proceedings IEEE INFOCOM 2001. Conference on Computer Communications. Twentieth Annual Joint Conference of the IEEE Computer and Communications Society (Cat. No.01CH37213).

[15]  Mark Silberstein,et al.  GPUnet , 2014, OSDI.

[16]  Vivek S. Pai,et al.  ModNet: A Modular Approach to Network Stack Extension , 2015, NSDI.

[17]  Moshe Bar Kernel Korner: kHTTPd, a Kernel-Based Web Server , 2000 .

[18]  Dawson R. Engler,et al.  ASHs: Application-specific handlers for high-performance messaging , 1996, SIGCOMM 1996.

[19]  Sotiris Ioannidis,et al.  GASPP: A GPU-Accelerated Stateful Packet Processing Framework , 2014, USENIX Annual Technical Conference.

[20]  Mendel Rosenblum,et al.  Network Interface Design for Low Latency Request-Response Protocols , 2013, USENIX ATC.

[21]  Costin Raiciu,et al.  Rekindling network protocol innovation with user-level stacks , 2014, CCRV.

[22]  Jan Rüth,et al.  How HTTP/2 pushes the web: An empirical study of HTTP/2 server push , 2017, 2017 IFIP Networking Conference (IFIP Networking) and Workshops.

[23]  Klaus Wehrle,et al.  Santa: Faster Packet Delivery for Commonly Wished Replies , 2015, SIGCOMM.

[24]  Jörg Ott,et al.  Poor man's content centric networking (with TCP) , 2011 .

[25]  Pablo Rodriguez,et al.  Multi-Context TLS (mcTLS): Enabling Secure In-Network Functionality in TLS , 2015, Comput. Commun. Rev..

[26]  Mark Handley,et al.  Network stack specialization for performance , 2015, SIGCOMM 2015.

[27]  KyoungSoo Park,et al.  PacketShader: Massively Parallel Packet Processing with GPUs to Accelerate Software Routers , 2010, NSDI 2010.

[28]  Steven McCanne,et al.  The BSD Packet Filter: A New Architecture for User-level Packet Capture , 1993, USENIX Winter.

[29]  Giuseppe Lettieri,et al.  VALE, a switched ethernet for virtual machines , 2012, CoNEXT '12.

[30]  Luigi Rizzo,et al.  netmap: A Novel Framework for Fast Packet I/O , 2012, USENIX ATC.

[31]  Jan Rüth,et al.  A First Look at QUIC in the Wild , 2018, PAM.

[32]  Hakim Weatherspoon,et al.  NetSlices: Scalable multi-core packet processing in user-space , 2012, 2012 ACM/IEEE Symposium on Architectures for Networking and Communications Systems (ANCS).

[33]  Michio Honda,et al.  StackMap: Low-Latency Networking with the OS Stack and Dedicated NICs , 2016, USENIX Annual Technical Conference.