Callisto: co-scheduling parallel runtime systems

It is increasingly important for parallel applications to run together on the same machine. However, current performance is often poor: programs do not adapt well to dynamically varying numbers of cores, and the CPU time received by concurrent jobs can differ drastically. This paper introduces Callisto, a resource management layer for parallel runtime systems. We describe Callisto and the implementation of two Callisto-enabled runtime systems---one for OpenMP, and another for a task-parallel programming model. We show how Callisto eliminates almost all of the scheduler-related interference between concurrent jobs, while still allowing jobs to claim otherwise-idle cores. We use examples from two recent graph analytics projects and from SPEC OMP.

[1]  John K. Ousterhout Scheduling Techniques for Concurrebt Systems. , 1982, ICDCS 1982.

[2]  Kevin Klues,et al.  Improving per-node efficiency in the datacenter with new OS abstractions , 2011, SoCC.

[3]  Andrew Gilliam Tucker,et al.  Efficient Scheduling on Multiprogrammed Shared-Memory Multiprocessors , 1994 .

[4]  Evangelos P. Markatos,et al.  First-class user-level threads , 1991, SOSP '91.

[5]  Simon L. Peyton Jones,et al.  Harnessing the Multicores: Nested Data Parallelism in Haskell , 2008, FSTTCS.

[6]  Benjamin Hindman,et al.  Lithe: enabling efficient composition of parallel libraries , 2009 .

[7]  Ryan Johnson,et al.  A new look at the roles of spinning and blocking , 2009, DaMoN '09.

[8]  Adrian Schüpbach,et al.  Design principles for end-to-end multicore schedulers , 2010 .

[9]  Michael Abd-El-Malek,et al.  Omega: flexible, scalable schedulers for large compute clusters , 2013, EuroSys '13.

[10]  Shikharesh Majumdar,et al.  Scheduling in multiprogrammed parallel systems , 1988, SIGMETRICS '88.

[11]  Timothy Creech Efficient multiprogramming for multicores with SCAF , 2013, 2013 46th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).

[12]  Brian N. Bershad,et al.  Scheduler activations: effective kernel support for the user-level management of parallelism , 1991, TOCS.

[13]  Kevin Klues,et al.  Tessellation: space-time partitioning in a manycore client OS , 2009 .

[14]  Benjamin Hindman,et al.  Composing parallel software efficiently with lithe , 2010, PLDI '10.

[15]  Michael L. Scott,et al.  Algorithms for scalable synchronization on shared-memory multiprocessors , 1991, TOCS.

[16]  John H. Reppy,et al.  A scheduling framework for general-purpose parallel languages , 2008, ICFP.

[17]  Sivarama P. Dandamudi Scheduling in Shared-Memory Multiprocessors , 2003 .

[18]  John Zahorjan,et al.  Processor scheduling in shared memory multiprocessors , 1990, SIGMETRICS '90.

[19]  Mark Moir,et al.  Constrained Data-Driven Parallelism , 2013, HotPar.

[20]  Keshav Pingali,et al.  Optimistic parallelism requires abstractions , 2007, PLDI '07.

[21]  Anant Agarwal,et al.  Factored operating systems (fos): the case for a scalable operating system for multicores , 2009, OPSR.

[22]  Alek Vainshtein,et al.  Optimal Strategies for Spinning and Blocking , 1994, J. Parallel Distributed Comput..

[23]  Randy H. Katz,et al.  Mesos: A Platform for Fine-Grained Resource Sharing in the Data Center , 2011, NSDI.

[24]  Xiaoning Ding,et al.  BWS: balanced work stealing for time-sharing multicores , 2012, EuroSys '12.

[25]  Simon L. Peyton Jones,et al.  Lightweight concurrency primitives for GHC , 2007, Haskell '07.

[26]  Leslie G. Valiant,et al.  A bridging model for parallel computation , 1990, CACM.

[27]  Kunle Olukotun,et al.  Green-Marl: a DSL for easy and efficient graph analysis , 2012, ASPLOS XVII.

[28]  Manuel Prieto,et al.  Survey of scheduling techniques for addressing shared resources in multicore processors , 2012, CSUR.

[29]  Ryan Johnson,et al.  Decoupling contention management from scheduling , 2010, ASPLOS XV.

[30]  Matteo Frigo,et al.  The implementation of the Cilk-5 multithreaded language , 1998, PLDI.

[31]  Mark Moir,et al.  SNZI: scalable NonZero indicators , 2007, PODC '07.

[32]  ZahorjanJohn,et al.  A dynamic processor allocation policy for multiprogrammed shared-memory multiprocessors , 1993 .

[33]  Simon Marlow,et al.  Parallel and Concurrent Programming in Haskell , 2013, CEFP.

[34]  John Kubiatowicz,et al.  Tessellation: Refactoring the OS around explicit resource containers with continuous adaptation , 2013, 2013 50th ACM/EDAC/IEEE Design Automation Conference (DAC).

[35]  John K. Ousterhout,et al.  Scheduling Techniques for Concurrent Systems , 1982, ICDCS.

[36]  Sarah Bird PACORA : Performance Aware Convex Optimization for Resource Allocation , 2011 .

[37]  Raj Vaswani,et al.  A dynamic processor allocation policy for multiprogrammed shared-memory multiprocessors , 1993, TOCS.

[38]  Simon Kahan,et al.  Grappa : A Latency-Tolerant Runtime for Large-Scale Irregular Applications , 2014 .

[39]  Adrian Schüpbach,et al.  The multikernel: a new OS architecture for scalable multicore systems , 2009, SOSP '09.