Topology-Aware and Dependence-Aware Scheduling and Memory Allocation for Task-Parallel Languages
暂无分享,去创建一个
Karine Heydemann | Albert Cohen | Antoniu Pop | Andi Drebes | Nathalie Drach-Temam | Albert Cohen | Andi Drebes | Antoniu Pop | K. Heydemann | Nathalie Drach-Temam
[1] Karine Heydemann,et al. Aftermath: A graphical tool for performance analysis and debugging of fine-grained task-parallel programs and run-time systems , 2014 .
[2] Eduard Ayguadé,et al. Hierarchical Task-Based Programming With StarSs , 2009, Int. J. High Perform. Comput. Appl..
[3] J. Demmel,et al. Sun Microsystems , 1996 .
[4] Anoop Gupta,et al. Data locality and load balancing in COOL , 1993, PPOPP '93.
[5] Jie Chen,et al. Analysis and approximation of optimal co-scheduling on Chip Multiprocessors , 2008, 2008 International Conference on Parallel Architectures and Compilation Techniques (PACT).
[6] Alexandra Fedorova,et al. Contention-Aware Scheduling on Multicore Systems , 2010, TOCS.
[7] Vivien Quéma,et al. Efficient Workstealing for Multicore Event-Driven Systems , 2010, 2010 IEEE 30th International Conference on Distributed Computing Systems.
[8] David Chase,et al. Dynamic circular work-stealing deque , 2005, SPAA '05.
[9] Vivek Sarkar,et al. X10: an object-oriented approach to non-uniform cluster computing , 2005, OOPSLA '05.
[10] Robert Tappan Morris,et al. Multiprocessor Support for Event-Driven Programs , 2003, USENIX Annual Technical Conference, General Track.
[11] Eli Upfal,et al. A simple load balancing scheme for task allocation in parallel machines , 1991, SPAA '91.
[12] Manuel Prieto,et al. Survey of scheduling techniques for addressing shared resources in multicore processors , 2012, CSUR.
[13] Katherine Yelick,et al. Hierarchical Work Stealing on Manycore Clusters , 2011 .
[14] Jens Palsberg,et al. Concurrent Collections , 2010, Sci. Program..
[15] Thierry Gautier,et al. KAAPI: A thread scheduling runtime system for data flow computations on cluster of multi-processors , 2007, PASCO '07.
[16] Kenneth E. Batcher,et al. Sorting networks and their applications , 1968, AFIPS Spring Joint Computing Conference.
[17] Vivien Quéma,et al. Traffic management: a holistic approach to memory placement on NUMA systems , 2013, ASPLOS '13.
[18] Alejandro Duran,et al. Support for OpenMP tasks in Nanos v4 , 2007, CASCON.
[19] Yi Guo,et al. SLAW: A scalable locality-aware adaptive work-stealing scheduler , 2010, IPDPS.
[20] Matteo Frigo,et al. The implementation of the Cilk-5 multithreaded language , 1998, PLDI.
[21] Quan Chen,et al. CATS: cache aware task-stealing based on online profiling in multi-socket multi-core architectures , 2012, ICS '12.
[22] Bradley C. Kuszmaul,et al. Cilk: an efficient multithreaded runtime system , 1995, PPOPP '95.
[23] Brice Goglin,et al. ForestGOMP: An Efficient OpenMP Environment for NUMA Architectures , 2010, International Journal of Parallel Programming.
[24] Samuel Thibault,et al. Building Portable Thread Schedulers for Hierarchical Multiprocessors: The BubbleSched Framework , 2007, Euro-Par.
[25] Bradford L. Chamberlain,et al. Parallel Programmability and the Chapel Language , 2007, Int. J. High Perform. Comput. Appl..
[26] Alejandro Duran,et al. Evaluation of OpenMP Task Scheduling Strategies , 2008, IWOMP.
[27] Christoforos E. Kozyrakis,et al. Locality-aware task management for unstructured parallelism: a quantitative limit study , 2013, SPAA.
[28] Albert Cohen,et al. Correct and efficient work-stealing for weak memory models , 2013, PPoPP '13.
[29] Guy E. Blelloch,et al. The Data Locality of Work Stealing , 2002, SPAA '00.
[30] Andrew Brownsword,et al. Schedule Data, Not Code , 2011 .
[31] Vivek Sarkar,et al. Habanero-Java: the new adventures of old X10 , 2011, PPPJ.
[32] Albert Cohen,et al. OpenStream: Expressiveness and data-flow compilation of OpenMP streaming programs , 2012, TACO.
[33] Nir Shavit,et al. Work dealing , 2002, SPAA '02.
[34] Cédric Augonnet,et al. StarPU: a unified platform for task scheduling on heterogeneous multicore architectures , 2011, Concurr. Comput. Pract. Exp..
[35] Jack J. Dongarra,et al. Collecting Performance Data with PAPI-C , 2009, Parallel Tools Workshop.
[36] Guy E. Blelloch,et al. The data locality of work stealing , 2000, SPAA.
[37] Ed Anderson,et al. LAPACK Users' Guide , 1995 .
[38] Thierry Gautier,et al. libKOMP, an Efficient OpenMP Runtime System for Both Fork-Join and Data Flow Paradigms , 2012, IWOMP.