The scalable commutativity rule

Developing software that scales on multicore processors is an inexact science dominated by guesswork, measurement, and expensive cycles of redesign and reimplementation. Current approaches are workload-driven and, hence, can reveal scalability bottlenecks only for known workloads and available software and hardware. This paper introduces an interface-driven approach to building scalable software. This approach is based on the scalable commutativity rule, which, informally stated, says that whenever interface operations commute, they can be implemented in a way that scales. We formalize this rule and prove it correct for any machine on which conflict-free operations scale, such as current cache-coherent multicore machines. The rule also enables a better design process for scalable software: programmers can now reason about scalability from the earliest stages of interface definition through software design, implementation, and evaluation.

[1]  Dilma Da Silva,et al.  Experience distributing objects in an SMMP OS , 2007, TOCS.

[3]  Martin C. Rinard,et al.  Commutativity analysis: a new analysis technique for parallelizing compilers , 1997, TOPL.

[4]  Dawson R. Engler,et al.  EXE: Automatically Generating Inputs of Death , 2008, TSEC.

[5]  Silas Boyd-Wickizer Optimizing communication bottlenecks in multiprocessor operating system kernels , 2014 .

[6]  Sanjay Ghemawat,et al.  MapReduce: Simplified Data Processing on Large Clusters , 2004, OSDI.

[7]  Jonathan Walpole,et al.  Relativistic red‐black trees , 2014, Concurr. Comput. Pract. Exp..

[8]  Anant Agarwal,et al.  Factored operating systems (fos): the case for a scalable operating system for multicores , 2009, OPSR.

[9]  Yang Zhang,et al.  Corey: An Operating System for Many Cores , 2008, OSDI.

[10]  Marc Shapiro,et al.  Convergent and Commutative Replicated Data Types , 2011, Bull. EATCS.

[11]  William E. Weihl,et al.  Commutativity-based concurrency control for abstract data types , 1988, [1988] Proceedings of the Twenty-First Annual Hawaii International Conference on System Sciences. Volume II: Software track.

[12]  M. Frans Kaashoek,et al.  RadixVM: scalable address spaces for multithreaded applications , 2013, EuroSys '13.

[13]  Michael Stumm,et al.  Hierarchical clustering: A structure for scalable multiprocessor operating system design , 1995, The Journal of Supercomputing.

[14]  Steven Hand,et al.  Exploring the limits of disjoint access parallelism , 2009 .

[15]  Marinus J. Plasmeijer,et al.  Gast: Generic Automated Software Testing , 2002, IFL.

[16]  M. Frans Kaashoek,et al.  RadixVM: Scalable address spaces for multithreaded applications (revised 2014-08-05) , 2014 .

[17]  Amos Israeli,et al.  Disjoint-access-parallel implementations of strong shared memory primitives , 1994, PODC '94.

[18]  Maurice Herlihy,et al.  Transactional boosting: a methodology for highly-concurrent transactional objects , 2008, PPoPP.

[19]  Haibo Chen,et al.  SSMalloc: a low-latency, locality-conscious memory allocator with stable performance scalability , 2012, APSys.

[20]  Mark Moir,et al.  SNZI: scalable NonZero indicators , 2007, PODC '07.

[21]  Marc Shapiro,et al.  Conflict-Free Replicated Data Types , 2011, SSS.

[22]  Michael L. Scott,et al.  Algorithms for scalable synchronization on shared-memory multiprocessors , 1991, TOCS.

[23]  Paul E. McKenney Differential Profiling , 1999, Softw. Pract. Exp..

[24]  D. M. Hutton,et al.  The Art of Multiprocessor Programming , 2008 .

[25]  Jason Evans April A Scalable Concurrent malloc(3) Implementation for FreeBSD , 2006 .

[26]  Bryan Cantrill,et al.  Real-World Concurrency , 2008, ACM Queue.

[27]  Robert Morris,et al.  Optimizing MapReduce for Multicore Architectures , 2010 .

[28]  Janak H. Patel,et al.  A low-overhead coherence solution for multiprocessors with private cache memories , 1998, ISCA '98.

[29]  Austin T. Clements,et al.  The scalable commutativity rule: designing scalable software for multicore processors , 2013, SOSP.

[30]  Michael Wolf,et al.  C4: the continuously concurrent compacting collector , 2011, ISMM '11.

[31]  David L. Black,et al.  Translation lookaside buffer consistency: a software approach , 1989, ASPLOS III.

[32]  Koushik Sen,et al.  DART: directed automated random testing , 2005, PLDI '05.

[33]  Christoph Lameter,et al.  Effective Synchronization on Linux/NUMA Systems , 2005 .

[34]  Maurice Herlihy,et al.  Linearizability: a correctness condition for concurrent objects , 1990, TOPL.

[35]  Koen Claessen,et al.  QuickCheck: a lightweight tool for random testing of Haskell programs , 2000, ICFP.

[36]  Dimitrios S. Nikolopoulos,et al.  Scalable locality-conscious multithreaded memory allocation , 2006, ISMM '06.

[37]  James Goodman,et al.  MESIF: A Two-Hop Cache Coherency Protocol for Point-to-Point Interconnects (2004) , 2004 .

[38]  Guy L. Steele,et al.  Making asynchronous parallelism safe for the world , 1989, POPL '90.

[39]  Adrian Schüpbach,et al.  The multikernel: a new OS architecture for scalable multicore systems , 2009, SOSP '09.

[40]  Philip A. Bernstein,et al.  Concurrency Control in Distributed Database Systems , 1986, CSUR.

[41]  Michael Stumm,et al.  Tornado: maximizing locality and concurrency in a shared memory multiprocessor operating system , 1999, OSDI '99.

[42]  Koushik Sen,et al.  CUTE: a concolic unit testing engine for C , 2005, ESEC/FSE-13.

[43]  Dawson R. Engler,et al.  KLEE: Unassisted and Automatic Generation of High-Coverage Tests for Complex Systems Programs , 2008, OSDI.

[44]  Daniel J. Bernstein,et al.  Some thoughts on security after ten years of qmail 1.0 , 2007, CSAW '07.

[45]  Hagit Attiya,et al.  Inherent Limitations on Disjoint-Access Parallel Implementations of Transactional Memory , 2010, Theory of Computing Systems.

[46]  Rachid Guerraoui,et al.  Laws of order: expensive synchronization in concurrent algorithms cannot be eliminated , 2011, POPL '11.

[47]  Robert Tappan Morris,et al.  An Analysis of Linux Scalability to Many Cores , 2010, OSDI.

[48]  Nikolaj Bjørner,et al.  Z3: An Efficient SMT Solver , 2008, TACAS.

[49]  Yun Zhang,et al.  Commutative set: a language extension for implicit parallel programming , 2011, PLDI '11.