Heartbeat scheduling: provable efficiency for nested parallelism
暂无分享,去创建一个
[1] Robert H. Halstead,et al. Lazy task creation: a technique for increasing the granularity of parallel programs , 1990, LISP and Functional Programming.
[2] Marc Feeley. Polling efficiently on stock hardware , 1993, FPCA '93.
[3] Arthur Charguéraud,et al. Oracle scheduling: controlling granularity in implicitly parallel languages , 2011, OOPSLA '11.
[4] Guy E. Blelloch,et al. A provably time-efficient parallel implementation of full speculation , 1999, TOPL.
[5] F. Warren Burton,et al. Executing functional programs on a virtual tree of processors , 1981, FPCA '81.
[6] Robert H. Halstead,et al. Implementation of multilisp: Lisp on a multiprocessor , 1984, LFP '84.
[7] Christos Kozyrakis,et al. Flexible architectural support for fine-grain scheduling , 2010, ASPLOS 2010.
[8] Seth Copen Goldstein,et al. Lazy Threads: Implementing a Fast Parallel Call , 1996, J. Parallel Distributed Comput..
[9] Alexandros Tzannes,et al. Lazy Scheduling: A Runtime Adaptive Scheduler for Declarative Parallelism , 2014, TOPL.
[10] James Reinders,et al. Intel® threading building blocks , 2008 .
[11] Charles E. Leiserson,et al. On-the-Fly Pipeline Parallelism , 2015, ACM Trans. Parallel Comput..
[12] Suresh Jagannathan,et al. MultiMLton: A multicore-aware runtime for standard ML , 2014, J. Funct. Program..
[13] David Chase,et al. Dynamic circular work-stealing deque , 2005, SPAA '05.
[14] Arthur Charguéraud,et al. Oracle-guided scheduling for controlling granularity in implicitly parallel languages* , 2016, Journal of Functional Programming.
[15] Guy E. Blelloch,et al. Effectively sharing a cache among threads , 2004, SPAA '04.
[16] Benjamin A. Dent,et al. Burroughs' B6500/B7500 stack mechanism , 1968, AFIPS '68 (Spring).
[17] Leslie G. Valiant,et al. A bridging model for parallel computation , 1990, CACM.
[18] Seth Copen Goldstein,et al. Enabling Primitives for Compiling Parallel Languages , 1995, LCR.
[19] Guy E. Blelloch,et al. Space-efficient scheduling of nested parallelism , 1999, TOPL.
[20] John M. Mellor-Crummey,et al. A Practical Solution to the Cactus Stack Problem , 2016, SPAA.
[21] Alexandros Tzannes,et al. 10 Lazy Scheduling: A Runtime Adaptive Scheduler , 2014 .
[22] Matteo Frigo,et al. The implementation of the Cilk-5 multithreaded language , 1998, PLDI.
[23] Christoforos E. Kozyrakis,et al. Flexible architectural support for fine-grain scheduling , 2010, ASPLOS XV.
[24] Guy E. Blelloch,et al. Beyond nested parallelism: tight bounds on work-stealing overheads for parallel futures , 2009, SPAA '09.
[25] Saumya K. Debray,et al. A Methodology for Granularity-Based Control of Parallelism in Logic Programs , 1996, J. Symb. Comput..
[26] Umut A. Acar,et al. Hierarchical memory management for mutable state , 2018, PPOPP.
[27] Taiichi Yuasa,et al. Backtracking-based load balancing , 2009, PPoPP '09.
[28] Vivek Sarkar,et al. X10: an object-oriented approach to non-uniform cluster computing , 2005, OOPSLA '05.
[29] Guy E. Blelloch,et al. The data locality of work stealing , 2000, SPAA.
[30] Richard P. Brent,et al. The Parallel Evaluation of General Arithmetic Expressions , 1974, JACM.
[31] Simon Marlow,et al. Parallel and Concurrent Programming in Haskell , 2013, CEFP.
[32] Vijaya Ramachandran,et al. Cache-efficient dynamic programming algorithms for multicores , 2008, SPAA '08.
[33] James R. Larus,et al. Using the run-time sizes of data structures to guide parallel-thread creation , 1994, LFP '94.
[34] C. Greg Plaxton,et al. Thread Scheduling for Multiprogrammed Multiprocessors , 1998, SPAA.
[35] Doug Lea,et al. A Java fork/join framework , 2000, JAVA '00.
[36] Edward D. Lazowska,et al. Speedup Versus Efficiency in Parallel Systems , 1989, IEEE Trans. Computers.
[37] Kenjiro Taura,et al. A static cut-off for task parallel programs , 2016, 2016 International Conference on Parallel Architecture and Compilation Techniques (PACT).
[38] Alejandro Duran,et al. An adaptive cut-off for task parallelism , 2008, 2008 SC - International Conference for High Performance Computing, Networking, Storage and Analysis.
[39] J. S. Weening. Parallel execution of LISP programs , 1990 .
[40] Vivek Sarkar,et al. Habanero-Java library: a Java 8 framework for multicore programming , 2014, PPPJ.
[41] Vivek Sarkar,et al. Deadlock-free scheduling of X10 computations with bounded resources , 2007, SPAA '07.
[42] Silas Boyd-Wickizer,et al. Using memory mapping to support cactus stacks in work-stealing runtime systems , 2010, 2010 19th International Conference on Parallel Architectures and Compilation Techniques (PACT).
[43] Guy E. Blelloch,et al. Hierarchical memory management for parallel programs , 2016, ICFP.
[44] Sebastian Burckhardt,et al. The design of a task parallel library , 2009, OOPSLA.
[45] Guy E. Blelloch,et al. Coupling Memory and Computation for Locality Management , 2015, SNAPL.
[46] Arthur Charguéraud,et al. Scheduling parallel programs by work stealing with private deques , 2013, PPoPP '13.
[47] Marc Feeley,et al. A Message Passing Implementation of Lazy Task Creation , 1992, Parallel Symbolic Computing.
[48] Alexandros Tzannes,et al. Lazy binary-splitting: a run-time adaptive work-stealing scheduler , 2010, PPoPP '10.
[49] Guy E. Blelloch,et al. Brief announcement: the problem based benchmark suite , 2012, SPAA '12.
[50] Seth Copen,et al. ENABLING PRIMITIVES FOR COMPILING PARALLEL LANGUAGES , 1995 .
[51] David R. O'Hallaron,et al. Languages, Compilers and Run-Time Systems for Scalable Computers , 1998, Springer US.
[52] Guy E. Blelloch,et al. Scheduling irregular parallel computations on hierarchical caches , 2011, SPAA '11.
[53] Charles E. Leiserson,et al. Space-Efficient Scheduling of Multithreaded Computations , 1998, SIAM J. Comput..
[54] Joseph S. Weening,et al. Low-Cost Process Creation and Dynamic Partitioning in Qlisp , 1989, Workshop on Parallel Lisp.
[55] Guy E. Blelloch,et al. Internally deterministic parallel algorithms can be fast , 2012, PPoPP '12.
[56] Guy E. Blelloch,et al. Provably efficient scheduling for languages with fine-grained parallelism , 1999, JACM.
[57] John H. Reppy,et al. Implicitly-threaded parallelism in Manticore , 2008, Journal of Functional Programming.