An Analysis of Linux Scalability to Many Cores

This paper analyzes the scalability of seven system applications (Exim, memcached, Apache, PostgreSQL, gmake, Psearchy, and MapReduce) running on Linux on a 48- core computer. Except for gmake, all applications trigger scalability bottlenecks inside a recent Linux kernel. Using mostly standard parallel programming techniques-- this paper introduces one new technique, sloppy counters-- these bottlenecks can be removed from the kernel or avoided by changing the applications slightly. Modifying the kernel required in total 3002 lines of code changes. A speculative conclusion from this analysis is that there is no scalability reason to give up on traditional operating system organizations just yet.

[1]  David L. Black,et al.  The duality of memory and communication in the implementation of a multiprocessor operating system , 1987, SOSP '87.

[2]  Brian N. Bershad,et al.  Lightweight remote procedure call , 1989, TOCS.

[3]  Michael Burrows,et al.  Performance of the Firefly RPC , 1989, TOCS.

[4]  Robert J. Fowler,et al.  The implementation of a coherent memory abstraction on a NUMA multiprocessor: experiences with platinum , 1989, SOSP '89.

[5]  Michael L. Scott,et al.  Simple but effective techniques for NUMA memory management , 1989, SOSP '89.

[6]  David L. Black Scheduling support for concurrency and parallelism in the Mach operating system , 1990, Computer.

[7]  Carla Schlatter Ellis,et al.  The robustness of NUMA memory management , 1991, SOSP '91.

[8]  Michael L. Scott,et al.  Algorithms for scalable synchronization on shared-memory multiprocessors , 1991, TOCS.

[9]  Anna R. Karlin,et al.  Empirical studies of competitve spinning for a shared-memory multiprocessor , 1991, SOSP '91.

[10]  Raj Vaswani,et al.  The implications of cache affinity on processor scheduling for multiprogrammed, shared memory multiprocessors , 1991, SOSP '91.

[11]  Maurice Herlihy,et al.  Wait-free synchronization , 1991, TOPL.

[12]  Brian N. Bershad,et al.  Scheduler activations: effective kernel support for the user-level management of parallelism , 1991, TOCS.

[13]  Anoop Gupta,et al.  The Stanford FLASH multiprocessor , 1994, ISCA '94.

[14]  Curt Schimmel UNIX systems for modern architectures - symmetric multiprocessing and caching for Kernel programmers , 1994, Addison-Wesley professional computing series.

[15]  Erich M. Nahum,et al.  Performance issues in parallelized network protocols , 1994, OSDI '94.

[16]  Anoop Gupta,et al.  Operating system support for improving data locality on CC-NUMA compute servers , 1996, ASPLOS VII.

[17]  David R. Karger,et al.  On the Feasibility of Peer-to-Peer Web Indexing and Search , 2003, IPTPS.

[18]  Paul E. McKenney,et al.  Scaling dcache with RCU , 2004 .

[19]  Sanjay Ghemawat,et al.  MapReduce: Simplified Data Processing on Large Clusters , 2004, OSDI.

[20]  Understanding the Linux 2.6.8.1 CPU Scheduler , 2005 .

[21]  Robert Tappan Morris,et al.  OverCite: A Distributed, Cooperative CiteSeer , 2006, NSDI.

[22]  Toshio Nakatani,et al.  Performance Studies of Commercial Workloads on a Multi-core System , 2007, 2007 IEEE 10th International Symposium on Workload Characterization.

[23]  Christoforos E. Kozyrakis,et al.  Evaluating MapReduce for Multi-core and Multiprocessor Systems , 2007, 2007 IEEE 13th International Symposium on High Performance Computer Architecture.

[24]  Bryan Veal,et al.  Performance scalability of a multi-core web server , 2007, ANCS '07.

[25]  Mark Moir,et al.  SNZI: scalable NonZero indicators , 2007, PODC '07.

[26]  Dilma Da Silva,et al.  Experience distributing objects in an SMMP OS , 2007, TOCS.

[27]  Yang Zhang,et al.  Corey: An Operating System for Many Cores , 2008, OSDI.

[28]  Bryan Cantrill,et al.  Real-World Concurrency , 2008, ACM Queue.

[29]  Yiqi Dai,et al.  Scalability Evaluation and Optimization of Multi-Core SIP Proxy Server , 2008, 2008 37th International Conference on Parallel Processing.

[30]  Katerina J. Argyraki,et al.  RouteBricks: exploiting parallelism to scale software routers , 2009, SOSP '09.

[31]  James Demmel,et al.  A view of the parallel computing landscape , 2009, CACM.

[32]  Anant Agarwal,et al.  Factored operating systems (fos): the case for a scalable operating system for multicores , 2009, OPSR.

[33]  Yan Cui,et al.  OSMark: A benchmark suite for understanding parallel scalability of operating systems on large scale multi-cores , 2009, 2009 2nd IEEE International Conference on Computer Science and Information Technology.

[34]  Linux multi-core scalability , 2009 .

[35]  Adrian Schüpbach,et al.  The multikernel: a new OS architecture for scalable multicore systems , 2009, SOSP '09.

[36]  Corey Gough,et al.  Kernel Scalability — Expanding the Horizon Beyond Fine Grain Locks , 2010 .

[37]  Yan Cui,et al.  Scaling OLTP applications on commodity multi-core platforms , 2010, 2010 IEEE International Symposium on Performance Analysis of Systems & Software (ISPASS).

[38]  Robert Morris,et al.  Optimizing MapReduce for Multicore Architectures , 2010 .

[39]  Robert Tappan Morris,et al.  Locating cache performance bottlenecks using data profiling , 2010, EuroSys '10.