Usenix Association 10th Usenix Symposium on Operating Systems Design and Implementation (osdi '12) 135 Megapipe: a New Programming Interface for Scalable Network I/o

We present MegaPipe, a new API for efficient, scalable network I/O for message-oriented workloads. The design of MegaPipe centers around the abstraction of a channel - a per-core, bidirectional pipe between the kernel and user space, used to exchange both I/O requests and event notifications. On top of the channel abstraction, we introduce three key concepts of MegaPipe: partitioning, lightweight socket (lwsocket), and batching. We implement MegaPipe in Linux and adapt memcached and nginx. Our results show that, by embracing a clean-slate design approach, MegaPipe is able to exploit new opportunities for improved performance and ease of programmability. In microbenchmarks on an 8-core server with 64 B messages, MegaPipe outperforms baseline Linux between 29% (for long connections) and 582% (for short connections). MegaPipe improves the performance of a modified version of memcached between 15% and 320%. For a workload based on real-world HTTP traces, MegaPipe boosts the throughput of nginx by 75%.

[1]  Willy Zwaenepoel,et al.  An Efficient and Portable Web Server , 1999 .

[2]  Mark Russinovich,et al.  High-Performance Memory-Based Web Servers: Kernel and User-Space Performance , 2001, USENIX ATC, General Track.

[3]  Ralf S. Engelschall Portable Multithreading-The Signal Stack Trick for User-Space Thread Creation , 2000, USENIX Annual Technical Conference, General Track.

[4]  Jonathan Lemon Kqueue - A Generic and Scalable Event Notification Facility , 2001, USENIX Annual Technical Conference, FREENIX Track.

[5]  Peter Druschel,et al.  A Scalable and Explicit Event Delivery Mechanism for UNIX , 1999, USENIX Annual Technical Conference, General Track.

[6]  Robert Morris,et al.  Non-scalable locks are dangerous , 2012 .

[7]  Eddie Kohler,et al.  Events Can Make Sense , 2007, USENIX Annual Technical Conference.

[8]  Steve R. Kleiman,et al.  Vnodes: An Architecture for Multiple File System Types in Sun UNIX , 1986, USENIX Summer.

[9]  Adrian Schüpbach,et al.  The multikernel: a new OS architecture for scalable multicore systems , 2009, SOSP '09.

[10]  Luigi Rizzo,et al.  netmap: A Novel Framework for Fast Packet I/O , 2012, USENIX ATC.

[11]  Mike Williams,et al.  Concurrent programming in erlang (second edition) , 1996 .

[12]  George C. Necula,et al.  Capriccio: scalable threads for internet services , 2003, SOSP '03.

[13]  Robert Tappan Morris,et al.  An Analysis of Linux Scalability to Many Cores , 2010, OSDI.

[14]  William J. Bolosky,et al.  Mach: A New Kernel Foundation for UNIX Development , 1986, USENIX Summer.

[15]  Michael Stumm,et al.  FlexSC: Flexible System Call Scheduling with Exception-Less System Calls , 2010, OSDI.

[16]  Willy Zwaenepoel,et al.  Flash: An efficient and portable Web server , 1999, USENIX Annual Technical Conference, General Track.

[17]  Joe Armstrong,et al.  Concurrent programming in ERLANG , 1993 .

[18]  Khaled Elmeleegy,et al.  Overclocking the Yahoo!: CDN for faster web page loads , 2011, IMC '11.

[19]  Michael Stumm,et al.  Exception-Less System Calls for Event-Driven Servers , 2011, USENIX Annual Technical Conference.

[20]  George Bosilca,et al.  The Common Communication Interface (CCI) , 2011, 2011 IEEE 19th Annual Symposium on High Performance Interconnects.

[21]  Robert Tappan Morris,et al.  Improving network connection locality on multicore systems , 2012, EuroSys '12.

[22]  Yang Zhang,et al.  Corey: An Operating System for Many Cores , 2008, OSDI.

[23]  David E. Culler,et al.  SEDA: an architecture for well-conditioned, scalable internet services , 2001, SOSP.

[24]  Alan L. Cox,et al.  Lazy Asynchronous I/O for Event-Driven Servers , 2004, USENIX Annual Technical Conference, General Track.

[25]  河野 健二 20世紀の名著名論:M. Accetta R. Baron W. Bolosky D. Golub R. Rashid A. Tevanian and M. Young:Mach : A New Kernel Foundation For UNIX Development , 2006 .

[26]  Bryan Veal,et al.  Performance scalability of a multi-core web server , 2007, ANCS '07.

[27]  Parag Agrawal,et al.  The case for RAMClouds: scalable high-performance storage entirely in DRAM , 2010, OPSR.