Composable Scheduling for the Heterogeneous Cloud

Modern parallel computing hardware demands increasingly specialized attention to the details of scheduling and load balancing across heterogeneous execution resources that may include GPU and cloud environments, in addition to traditional CPUs. Many existing solutions address the challenges of particular resources, but do so in isolation, and in general do not compose within larger systems. We propose a general, composable abstraction for execution resources, along with a continuation-based meta-scheduler that harnesses those resources in the context of a deterministic parallel programming library for Haskell. We demonstrate performance benefits of combined CPU/GPU scheduling over either alone, and of combined multithreaded/distributed scheduling over existing distributed programming approaches for Haskell.

[1]  Simon L. Peyton Jones,et al.  Runtime support for multicore Haskell , 2009, ICFP.

[2]  Ryan Newton,et al.  Intel Concurrent Collections for Haskell , 2011 .

[3]  Bo Joel Svensson,et al.  Obsidian: A Domain Specific Embedded Language for Parallel Programming of Graphics Processors , 2008, IFL.

[4]  Bradley C. Kuszmaul,et al.  Cilk: an efficient multithreaded runtime system , 1995, PPOPP '95.

[5]  C. Greg Plaxton,et al.  Thread Scheduling for Multiprogrammed Multiprocessors , 1998, SPAA '98.

[6]  Michael D. McCool,et al.  Intel's Array Building Blocks: A retargetable, dynamic compiler and embedded language , 2011, International Symposium on Code Generation and Optimization (CGO 2011).

[7]  C. Greg Plaxton,et al.  Thread Scheduling for Multiprogrammed Multiprocessors , 1998, SPAA.

[8]  Don Syme,et al.  The F# Asynchronous Programming Model , 2011, PADL.

[9]  Alexandra Fedorova,et al.  A case for NUMA-aware contention management on multicore systems , 2010, 2010 19th International Conference on Parallel Architectures and Compilation Techniques (PACT).

[10]  Bryan O'Sullivan,et al.  Scalable i/o event handling for GHC , 2010 .

[11]  Doug Lea,et al.  A Java fork/join framework , 2000, JAVA '00.

[12]  John H. Reppy,et al.  Manticore: a heterogeneous parallel language , 2007, DAMP '07.

[13]  Manuel M. T. Chakravarty,et al.  Accelerating Haskell array codes with multicore GPUs , 2011, DAMP '11.

[14]  Guy E. Blelloch,et al.  Space-efficient scheduling of parallelism with synchronization variables , 1997, SPAA '97.

[15]  Simon Peyton Jones,et al.  A monad for deterministic parallelism , 2012 .

[16]  Peng Li,et al.  Combining events and threads for scalable network services implementation and evaluation of monadic, application-level concurrency primitives , 2007, PLDI '07.

[17]  Koen Claessen,et al.  A poor man's concurrency monad , 1999, Journal of Functional Programming.

[18]  Guy E. Blelloch,et al.  Beyond nested parallelism: tight bounds on work-stealing overheads for parallel futures , 2009, SPAA '09.

[19]  Geoffrey Mainland,et al.  Nikola: embedding compiled GPU functions in Haskell , 2010 .

[20]  Martin Odersky,et al.  Implementing first-class polymorphic delimited continuations by a type-directed selective CPS-transform , 2009, ICFP.

[21]  Simon L. Peyton Jones,et al.  Lightweight concurrency primitives for GHC , 2007, Haskell '07.

[22]  Mitchell Wand,et al.  Obtaining Coroutines with Continuations , 1986, Comput. Lang..

[23]  Andrew P. Black,et al.  Towards Haskell in the cloud , 2012 .

[24]  Pradeep Dubey,et al.  Efficient implementation of sorting on multi-core SIMD CPU architecture , 2008, Proc. VLDB Endow..

[25]  Philip W. Trinder,et al.  Implementing a High-Level Distributed-Memory Parallel Haskell in Haskell , 2011, IFL.

[26]  James Reinders,et al.  Intel threading building blocks - outfitting C++ for multi-core processor parallelism , 2007 .

[27]  Sebastian Burckhardt,et al.  The design of a task parallel library , 2009, OOPSLA 2009.

[28]  Johan Jeuring,et al.  A generic deriving mechanism for Haskell , 2010 .

[29]  Guy E. Blelloch,et al.  Space profiling for parallel functional programs , 2008, ICFP.