Runtime Systems for Extreme Scale Platforms
暂无分享,去创建一个
[1] Stephen L. Olivier,et al. UTS: An Unbalanced Tree Search Benchmark , 2006, LCPC.
[2] Arvind,et al. M-Structures: Extending a Parallel, Non-strict, Functional Language with State , 1991, FPCA.
[3] Robert W. Numrich,et al. Co-array Fortran for parallel programming , 1998, FORF.
[4] Volker Strumpen,et al. Cache oblivious stencil computations , 2005, ICS '05.
[5] Katherine A. Yelick,et al. Multi-threading and one-sided communication in parallel LU factorization , 2007, Proceedings of the 2007 ACM/IEEE Conference on Supercomputing (SC '07).
[6] Carl Hewitt,et al. The incremental garbage collection of processes , 1977, Artificial Intelligence and Programming Languages.
[7] Katherine A. Yelick,et al. Titanium: A High-performance Java Dialect , 1998, Concurr. Pract. Exp..
[8] Bowen Alpern,et al. Modeling parallel computers as memory hierarchies , 1993, Proceedings of Workshop on Programming Models for Massively Parallel Computers.
[9] William N. Scherer,et al. A new vision for coarray Fortran , 2009, PGAS '09.
[10] Vivek Sarkar. Synchronization using counting semaphores , 1988, ICS '88.
[11] Alexander Aiken,et al. Legion: Expressing locality and independence with logical regions , 2012, 2012 International Conference for High Performance Computing, Networking, Storage and Analysis.
[12] Boris D. Lubachevsky. Synchronization barrier and related tools for shared memory parallel programming , 2005, International Journal of Parallel Programming.
[13] Eduard Ayguadé,et al. Overlapping communication and computation by using a hybrid MPI/SMPSs approach , 2010, ICS '10.
[14] Vivek Sarkar,et al. X10: an object-oriented approach to non-uniform cluster computing , 2005, OOPSLA '05.
[15] Dhabaleswar K. Panda,et al. MVAPICH-Aptus: Scalable high-performance multi-transport MPI over InfiniBand , 2008, 2008 IEEE International Symposium on Parallel and Distributed Processing.
[16] Andrea C. Arpaci-Dusseau,et al. Parallel programming in Split-C , 1993, Supercomputing '93. Proceedings.
[17] Jason Duell,et al. Productivity and performance using partitioned global address space languages , 2007, PASCO '07.
[18] Rajeev Thakur,et al. Test suite for evaluating performance of multithreaded MPI communication , 2009, Parallel Comput..
[19] Matteo Frigo,et al. The implementation of the Cilk-5 multithreaded language , 1998, PLDI.
[20] Per Brinch Hansen. The Origin of Concurrent Programming , 2002, Springer New York.
[21] Vivek Sarkar,et al. Habanero-Java: the new adventures of old X10 , 2011, PPPJ.
[22] Katherine Yelick,et al. Introduction to UPC and Language Specification , 2000 .
[23] Stephen L. Olivier,et al. Dynamic Load Balancing of Unbalanced Computations Using Message Passing , 2007, 2007 IEEE International Parallel and Distributed Processing Symposium.
[24] Barbara Chapman,et al. Using OpenMP: Portable Shared Memory Parallel Programming (Scientific and Engineering Computation) , 2007 .
[25] Vivek Sarkar,et al. Communication Optimizations for Distributed-Memory X10 Programs , 2011, 2011 IEEE International Parallel & Distributed Processing Symposium.
[26] Sayantan Sur,et al. Unifying UPC and MPI runtimes: experience with MVAPICH , 2010, PGAS '10.
[27] William J. Dally,et al. Sequoia: Programming the Memory Hierarchy , 2006, International Conference on Software Composition.
[28] David A. Padua,et al. Programming for parallelism and locality with hierarchically tiled arrays , 2006, PPoPP '06.
[29] Paul N. Hilfinger,et al. Better Tiling and Array Contraction for Compiling Scientific Programs , 2002, ACM/IEEE SC 2002 Conference (SC'02).
[30] David Chase,et al. Dynamic circular work-stealing deque , 2005, SPAA '05.
[31] Alexandros Stamatakis,et al. Hybrid MPI/Pthreads parallelization of the RAxML phylogenetics code , 2010, 2010 IEEE International Symposium on Parallel & Distributed Processing, Workshops and Phd Forum (IPDPSW).
[32] Yves Robert,et al. Implementing a Systolic Algorithm for QR Factorization on Multicore Clusters with PaRSEC , 2013, Euro-Par Workshops.
[33] Katherine A. Yelick,et al. Hybrid PGAS runtime support for multicore nodes , 2010, PGAS '10.
[34] Rolf Riesen,et al. Portals 3.0: protocol building blocks for low overhead communication , 2002, Proceedings 16th International Parallel and Distributed Processing Symposium.
[35] Michael L. Scott,et al. Fast, contention-free combining tree barriers for shared-memory multiprocessors , 1994, International Journal of Parallel Programming.
[36] Graph Topology. MPI at Exascale , 2010 .
[37] Vivek Sarkar,et al. Comparing the usability of library vs. language approaches to task parallelism , 2010, PLATEAU '10.
[38] Rohit Chandra,et al. Parallel programming in openMP , 2000 .
[39] Haoqiang Jin,et al. Performance Characteristics of the Multi-Zone NAS Parallel Benchmarks , 2004, IPDPS.
[40] Franck Cappello,et al. MPI versus MPI+OpenMP on the IBM SP for the NAS Benchmarks , 2000, ACM/IEEE SC 2000 Conference (SC'00).
[41] Wu-chun Feng,et al. On the efficacy of GPU-integrated MPI for scientific applications , 2013, HPDC '13.
[42] D. Panda,et al. Extending OpenSHMEM for GPU Computing , 2013, 2013 IEEE 27th International Symposium on Parallel and Distributed Processing.
[43] Lei Huang,et al. Unified Parallel C for GPU Clusters: Language Extensions and Compiler Implementation , 2010, LCPC.
[44] Stephen A. Jarvis,et al. Performance analysis of a hybrid MPI/CUDA implementation of the NASLU benchmark , 2011, PERV.
[45] Georg Hager,et al. Hybrid MPI/OpenMP Parallel Programming on Clusters of Multi-Core SMP Nodes , 2009, 2009 17th Euromicro International Conference on Parallel, Distributed and Network-based Processing.
[46] Katherine A. Yelick,et al. Optimizing bandwidth limited problems using one-sided communication and overlap , 2005, Proceedings 20th IEEE International Parallel & Distributed Processing Symposium.
[47] Katherine Yelick,et al. Titanium Language Reference Manual , 2001 .
[48] Vivek Sarkar,et al. Phasers: a unified deadlock-free construct for collective and point-to-point synchronization , 2008, ICS '08.
[49] Vivek Sarkar,et al. Phaser accumulators: A new reduction construct for dynamic parallelism , 2009, 2009 IEEE International Symposium on Parallel & Distributed Processing.
[50] Laxmikant V. Kalé,et al. Adaptive MPI , 2003, LCPC.
[51] Stephen A. Edwards,et al. Compile-Time Analysis and Specialization of Clocks in Concurrent Programs , 2009, CC.
[52] James Reinders,et al. Intel® threading building blocks , 2008 .
[53] Vivek Sarkar,et al. Hardware and Software Tradeoffs for Task Synchronization on Manycore Architectures , 2011, Euro-Par.
[54] Guillaume Mercier,et al. Design and evaluation of Nemesis, a scalable, low-latency, message-passing communication subsystem , 2006, Sixth IEEE International Symposium on Cluster Computing and the Grid (CCGRID'06).
[55] John C. Reynolds,et al. The discoveries of continuations , 1993, LISP Symb. Comput..
[56] Dhabaleswar K. Panda,et al. Scalable Earthquake Simulation on Petascale Supercomputers , 2010, 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis.
[57] Vivek Sarkar,et al. Integrating Asynchronous Task Parallelism with MPI , 2013, 2013 IEEE 27th International Symposium on Parallel and Distributed Processing.
[58] Dong Li,et al. Hybrid MPI/OpenMP power-aware computing , 2010, 2010 IEEE International Symposium on Parallel & Distributed Processing (IPDPS).
[59] Yvon Jégou,et al. Task Migration and Fine Grain Parallelism on Distributed Memory Architectures , 1997, PaCT.
[60] Bradley C. Kuszmaul,et al. The pochoir stencil compiler , 2011, SPAA '11.
[61] Alejandro Duran,et al. Productive Cluster Programming with OmpSs , 2011, Euro-Par.
[62] Franck Cappello,et al. Performance characteristics of a network of commodity multiprocessors for the NAS benchmarks using a hybrid memory model , 1999, 1999 International Conference on Parallel Architectures and Compilation Techniques (Cat. No.PR00425).
[63] Keshav Pingali,et al. I-structures: Data structures for parallel computing , 1986, Graph Reduction.
[64] Georg Hager,et al. Hybrid MPI and OpenMP Parallel Programming , 2006, PVM/MPI.
[65] Laxmikant V. Kalé,et al. CHARM++: a portable concurrent object oriented system based on C++ , 1993, OOPSLA '93.
[66] L. Dagum,et al. OpenMP: an industry standard API for shared-memory programming , 1998 .
[67] Dan Bonachea. GASNet Specification, v1.1 , 2002 .
[68] Nicholas Carriero,et al. Linda and Friends , 1986, Computer.
[69] Victor Luchangco,et al. The Fortress Language Specification Version 1.0 , 2007 .
[70] Jonathan Green,et al. Multi-core and Network Aware MPI Topology Functions , 2011, EuroMPI.
[71] Guy E. Blelloch,et al. Vector Models for Data-Parallel Computing , 1990 .
[72] Alejandro Duran,et al. Ompss: a Proposal for Programming Heterogeneous Multi-Core Architectures , 2011, Parallel Process. Lett..
[73] Hasan U. Akay,et al. Hybrid Parallelism for CFD Simulations: Combining MPI with OpenMP , 2009 .
[74] Tao Yang,et al. Run-Time Techniques for Exploiting Irregular Task Parallelism on Distributed Memory Architectures , 1997, J. Parallel Distributed Comput..
[75] Laxmikant V. Kalé,et al. Work stealing and persistence-based load balancers for iterative overdecomposed applications , 2012, HPDC '12.
[76] Kourosh Gharachorloo,et al. Fine-grain software distributed shared memory on SMP clusters , 1998, Proceedings 1998 Fourth International Symposium on High-Performance Computer Architecture.
[77] David A. Bader,et al. A novel FDTD application featuring OpenMP-MPI hybrid parallelization , 2004 .
[78] Robert H. Halstead,et al. Implementation of multilisp: Lisp on a multiprocessor , 1984, LFP '84.
[79] Debra Hensgen,et al. Two algorithms for barrier synchronization , 1988, International Journal of Parallel Programming.
[80] Jack Dongarra,et al. MPI: The Complete Reference , 1996 .
[81] Vivek Sarkar,et al. Hierarchical Place Trees: A Portable Abstraction for Task Parallelism and Data Movement , 2009, LCPC.
[82] Robert D. Blumofe,et al. Scheduling multithreaded computations by work stealing , 1994, Proceedings 35th Annual Symposium on Foundations of Computer Science.
[83] Philippe Olivier Alexandre Navaux,et al. Challenges and Issues of Supporting Task Parallelism in MPI , 2010, EuroMPI.
[84] Guang R. Gao,et al. TiNy threads: a thread virtual machine for the Cyclops64 cellular architecture , 2005, 19th IEEE International Parallel and Distributed Processing Symposium.
[85] Yi Guo,et al. Work-first and help-first scheduling policies for async-finish task parallelism , 2009, 2009 IEEE International Symposium on Parallel & Distributed Processing.
[86] Michael L. Scott,et al. Algorithms for scalable synchronization on shared-memory multiprocessors , 1991, TOCS.
[87] Thomas L. Sterling,et al. ParalleX An Advanced Parallel Execution Model for Scaling-Impaired Applications , 2009, 2009 International Conference on Parallel Processing Workshops.
[88] Katherine Yelick,et al. Auto-tuning stencil codes for cache-based multicore platforms , 2009 .
[89] Sachin S. Sapatnekar,et al. A Framework for Exploiting Task and Data Parallelism on Distributed Memory Multicomputers , 1997, IEEE Trans. Parallel Distributed Syst..
[90] Vivek Sarkar,et al. Data-Driven Tasks and Their Implementation , 2011, 2011 International Conference on Parallel Processing.
[91] Emmanuel Jeannot,et al. Near-Optimal Placement of MPI Processes on Hierarchical NUMA Architectures , 2010, Euro-Par.
[92] Vivek Sarkar,et al. Hierarchical phasers for scalable synchronization and reductions in dynamic parallelism , 2010, 2010 IEEE International Symposium on Parallel & Distributed Processing (IPDPS).
[93] Anoop Gupta,et al. COOL: An object-based language for parallel programming , 1994, Computer.
[94] Bradford L. Chamberlain,et al. Parallel Programmability and the Chapel Language , 2007, Int. J. High Perform. Comput. Appl..
[95] Eugene D. Brooks,et al. The butterfly barrier , 1986, International Journal of Parallel Programming.
[96] Fiona Reid,et al. A Microbenchmark Suite for OpenMP Tasks , 2012, IWOMP.