Towards Unifying OpenMP Under the Task-Parallel Paradigm - Implementation and Performance of the taskloop Construct
暂无分享,去创建一个
[1] Alejandro Duran,et al. The Design of OpenMP Tasks , 2009, IEEE Transactions on Parallel and Distributed Systems.
[2] L.M. Ni,et al. Trapezoid Self-Scheduling: A Practical Scheduling Scheme for Parallel Compilers , 1993, IEEE Trans. Parallel Distributed Syst..
[3] Basilio B. Fraguela,et al. A Generic Algorithm Template for Divide-and-Conquer in Multicore Systems , 2010, 2010 IEEE 12th International Conference on High Performance Computing and Communications (HPCC).
[4] Martin Schulz,et al. Scalable Critical-Path Based Performance Analysis , 2012, 2012 IEEE 26th International Parallel and Distributed Processing Symposium.
[5] Guy E. Blelloch,et al. The data locality of work stealing , 2000, SPAA.
[6] Artur Podobas,et al. Using Transactional Memory to Avoid Blocking in OpenMP Synchronization Directives - Don't Wait, Speculate! , 2015, IWOMP.
[7] Alejandro Duran,et al. Ompss: a Proposal for Programming Heterogeneous Multi-Core Architectures , 2011, Parallel Process. Lett..
[8] Michael Voss,et al. Runtime empirical selection of loop schedulers on hyperthreaded SMPs , 2005, 19th IEEE International Parallel and Distributed Processing Symposium.
[9] CONSTANTINE D. POLYCHRONOPOULOS,et al. Guided Self-Scheduling: A Practical Scheduling Scheme for Parallel Supercomputers , 1987, IEEE Transactions on Computers.
[10] Robert H. Halstead,et al. Lazy task creation: a technique for increasing the granularity of parallel programs , 1990, IEEE Trans. Parallel Distributed Syst..
[11] Charles E. Leiserson,et al. The Cilk++ concurrency platform , 2009, 2009 46th ACM/IEEE Design Automation Conference.
[12] Seth Copen Goldstein,et al. Lazy Threads: Implementing a Fast Parallel Call , 1996, J. Parallel Distributed Comput..
[13] Kevin Skadron,et al. Rodinia: A benchmark suite for heterogeneous computing , 2009, 2009 IEEE International Symposium on Workload Characterization (IISWC).
[14] Mihai Burcea,et al. An Adaptive OpenMP Loop Scheduler for Hyperthreaded SMPs , 2004, PDCS.
[15] Rudolf Eigenmann,et al. SPEComp: A New Benchmark Suite for Measuring Parallel Computer Performance , 2001, WOMPAT.
[16] Piyush Kumar. Cache Oblivious Algorithms , 2002, Algorithms for Memory Hierarchies.
[17] Vladimir Vlassov,et al. TurboBŁYSK: Scheduling for Improved Data-Driven Task Performance with Fast Dependency Resolution , 2014, IWOMP.