Extending the Nested Parallel Model to the Nested Dataflow Model with Provably Efficient Schedulers
暂无分享,去创建一个
[1] Guy E. Blelloch,et al. Pipelining with Futures , 1997, SPAA '97.
[2] Guy E. Blelloch,et al. Space-efficient scheduling of parallelism with synchronization variables , 1997, SPAA '97.
[3] Guy E. Blelloch,et al. A provably time-efficient parallel implementation of full speculation , 1999, TOPL.
[4] Charles E. Leiserson,et al. Deterministic parallel random-number generation for dynamic-multithreading platforms , 2012, PPoPP '12.
[5] Zvi Galil,et al. Parallel Algorithms for Dynamic Programming Recurrences with More than O(1) Dependency , 1994, J. Parallel Distributed Comput..
[6] Robert A. van de Geijn,et al. Elemental: A New Framework for Distributed Memory Dense Matrix Computations , 2013, TOMS.
[7] Thomas Hérault,et al. DAGuE: A Generic Distributed DAG Engine for High Performance Computing , 2011, 2011 IEEE International Symposium on Parallel and Distributed Processing Workshops and Phd Forum.
[8] Silas Boyd-Wickizer,et al. Using memory mapping to support cactus stacks in work-stealing runtime systems , 2010, 2010 19th International Conference on Parallel Architectures and Compilation Techniques (PACT).
[9] Todd Mytkowicz,et al. Parallelizing dynamic programming through rank convergence , 2014, PPoPP '14.
[10] C. Greg Plaxton,et al. Thread Scheduling for Multiprogrammed Multiprocessors , 1998, SPAA '98.
[11] Ernie Chan,et al. Runtime Data Flow Graph Scheduling of Matrix Computations with Multiple Hardware Accelerators FLAME Working Note # 50 , 2010 .
[12] Robert A. van de Geijn,et al. The science of deriving dense linear algebra algorithms , 2005, TOMS.
[13] Guy E. Blelloch,et al. Space-efficient scheduling for parallel, multithreaded computations , 1999 .
[14] James Demmel,et al. Minimizing Communication in Numerical Linear Algebra , 2009, SIAM J. Matrix Anal. Appl..
[15] Charles E. Leiserson,et al. Executing task graphs using work-stealing , 2010, 2010 IEEE International Symposium on Parallel & Distributed Processing (IPDPS).
[16] Vijaya Ramachandran,et al. Oblivious algorithms for multicores and network of processors , 2010, 2010 IEEE International Symposium on Parallel & Distributed Processing (IPDPS).
[17] Xin-She Yang,et al. Introduction to Algorithms , 2021, Nature-Inspired Optimization Algorithms.
[18] Guy E. Blelloch,et al. Provably efficient scheduling for languages with fine-grained parallelism , 1995, SPAA '95.
[19] George Bosilca,et al. Hierarchical DAG Scheduling for Hybrid Distributed Systems , 2015, 2015 IEEE International Parallel and Distributed Processing Symposium.
[20] Keshav Pingali,et al. The tao of parallelism in algorithms , 2011, PLDI '11.
[21] Harsha Vardhan Simhadri,et al. Program-Centric Cost Models for Locality and Parallelism , 2013 .
[22] Haibin Kan,et al. Cache-oblivious wavefront: improving parallelism of recursive dynamic programming algorithms without losing cache-efficiency , 2015, PPoPP.
[23] Guy E. Blelloch,et al. Scheduling irregular parallel computations on hierarchical caches , 2011, SPAA '11.
[24] Guy E. Blelloch,et al. Experimental Analysis of Space-Bounded Schedulers , 2016, ACM Trans. Parallel Comput..
[25] Charles E. Leiserson,et al. Space-efficient scheduling of multithreaded computations , 1993, SIAM J. Comput..
[26] Jack Dongarra,et al. Distibuted Dense Numerical Linear Algebra Algorithms on Massively Parallel Architectures: DPLASMA , 2011 .
[27] Frédo Durand,et al. Halide: a language and compiler for optimizing parallelism, locality, and recomputation in image processing pipelines , 2013, PLDI 2013.
[28] Guy E. Blelloch,et al. Beyond nested parallelism: tight bounds on work-stealing overheads for parallel futures , 2009, SPAA '09.
[29] O. Gotoh. An improved algorithm for matching biological sequences. , 1982, Journal of molecular biology.
[30] Guy E. Blelloch,et al. Low depth cache-oblivious algorithms , 2010, SPAA '10.
[31] Michael A. Bender,et al. Cache-Adaptive Algorithms , 2014, SODA.
[32] Thomas Hérault,et al. PaRSEC: Exploiting Heterogeneity to Enhance Scalability , 2013, Computing in Science & Engineering.
[33] Stephen Warshall,et al. A Theorem on Boolean Matrices , 1962, JACM.
[34] Sivan Toledo. Locality of Reference in LU Decomposition with Partial Pivoting , 1997, SIAM J. Matrix Anal. Appl..
[35] Guy E. Blelloch,et al. Effectively sharing a cache among threads , 2004, SPAA '04.
[36] Guy E. Blelloch,et al. The Data Locality of Work Stealing , 2002, SPAA '00.
[37] Timothy A. Davis,et al. A Concurrent Dynamic Task Graph , 1993, 1993 International Conference on Parallel Processing - ICPP'93.
[38] Daniel P. Friedman,et al. Aspects of Applicative Programming for Parallel Processing , 1978, IEEE Transactions on Computers.
[39] Bowen Alpern,et al. Modeling parallel computers as memory hierarchies , 1993, Proceedings of Workshop on Programming Models for Massively Parallel Computers.
[40] Jack Dongarra,et al. Numerical linear algebra on emerging architectures: The PLASMA and MAGMA projects , 2009 .
[41] Richard Cole,et al. Efficient Resource Oblivious Algorithms for Multicores , 2011, ArXiv.
[42] Ronald L. Rivest,et al. Introduction to Algorithms, third edition , 2009 .
[43] Robert A. van de Geijn,et al. FLAME: Formal Linear Algebra Methods Environment , 2001, TOMS.
[44] Carl Hewitt,et al. The incremental garbage collection of processes , 1977, Artificial Intelligence and Programming Languages.
[45] Maurice Herlihy,et al. Well-Structured Futures and Cache Locality , 2013, PPoPP.
[46] Charles E. Leiserson,et al. On-the-Fly Pipeline Parallelism , 2015, ACM Trans. Parallel Comput..