Using shared-data localization to reduce the cost of inspector-execution in unified-parallel-C programs
暂无分享,去创建一个
José Nelson Amaral | Xavier Martorell | Michail Alvanos | Montse Farreras | Ettore Tiotto | X. Martorell | J. N. Amaral | Michail Alvanos | Ettore Tiotto | Montse Farreras
[1] Torsten Hoefler,et al. The PERCS High-Performance Interconnect , 2010, 2010 18th IEEE Symposium on High Performance Interconnects.
[2] Joel H. Saltz,et al. Communication Optimizations for Irregular Scientific Computations on Distributed Memory Architectures , 1994, J. Parallel Distributed Comput..
[3] Clifford Stein,et al. Introduction to Algorithms, 2nd edition. , 2001 .
[4] Shigeru Chiba,et al. A New Optimization Technique for the Inspector-Executor Method , 2002, IASTED PDCS.
[5] Jimmy Su,et al. Automatic support for irregular computations in a high-level language , 2005, 19th IEEE International Parallel and Distributed Processing Symposium.
[6] Balaram Sinharoy,et al. POWER7: IBM's next generation server processor , 2010, 2009 IEEE Hot Chips 21 Symposium (HCS).
[7] Katherine A. Yelick,et al. Titanium: A High-performance Java Dialect , 1998, Concurr. Pract. Exp..
[8] John M. Mellor-Crummey,et al. Effective communication coalescing for data-parallel applications , 2005, PPOPP.
[9] Zhang Zhang,et al. A UPC runtime system based on MPI and POSIX threads , 2006, 14th Euromicro International Conference on Parallel, Distributed, and Network-Based Processing (PDP'06).
[10] Edith Schonberg,et al. A Unified Framework for Optimizing Communication in Data-Parallel Programs , 1996, IEEE Trans. Parallel Distributed Syst..
[11] Robert W. Numrich,et al. Co-array Fortran for parallel programming , 1998, FORF.
[12] Ramakrishnan Rajamony,et al. PERCS: The IBM POWER7-IH high-performance computing system , 2011, IBM J. Res. Dev..
[13] Message P Forum,et al. MPI: A Message-Passing Interface Standard , 1994 .
[14] Mark N. Wegman,et al. Efficiently computing static single assignment form and the control dependence graph , 1991, TOPL.
[15] José Nelson Amaral,et al. An unified parallel C compiler that implements automatic communication aggregation , 2009 .
[16] Xunhao Li,et al. Jit4OpenCL: a compiler from Python to OpenCL , 2011 .
[17] José Nelson Amaral,et al. Improving communication in PGAS environments: static and dynamic coalescing in UPC , 2013, ICS '13.
[18] Sverre J. Aarseth,et al. Gravitational N-Body Simulations , 2003 .
[19] Yunheung Paek,et al. Efficient and precise array access analysis , 2002, TOPL.
[20] Victor Luchangco,et al. The Fortress Language Specification Version 1.0 , 2007 .
[21] José Nelson Amaral,et al. Compiling Python to a hybrid execution environment , 2010, GPGPU-3.
[22] Rafael Asenjo,et al. Global Data Re-allocation via Communication Aggregation in Chapel , 2012, 2012 IEEE 24th International Symposium on Computer Architecture and High Performance Computing.
[23] José Nelson Amaral,et al. Shared memory programming for large scale machines , 2006, PLDI '06.
[24] José Nelson Amaral,et al. Reducing Compiler-Inserted Instrumentation in Unified-Parallel-C Code Generation , 2014, 2014 IEEE 26th International Symposium on Computer Architecture and High Performance Computing.
[25] Michail Alvanos,et al. Memory Management Techniques for Exploiting RDMA in PGAS Languages , 2014, LCPC.
[26] David A. Padua,et al. Compiling for a Hybrid Programming Model Using the LMAD Representation , 2001, LCPC.
[27] Michail Alvanos,et al. Performance Analysis of the IBM XL UPC on the PERCS Architecture , 2013 .
[28] Sverre J. Aarseth. Gravitational N-Body Simulations: Tools and Algorithms , 2003 .
[29] Xavier Martorell,et al. Automatic communication coalescing for irregular computations in UPC language , 2012, CASCON.
[30] Katherine A. Yelick,et al. Communication optimizations for fine-grained UPC applications , 2005, 14th International Conference on Parallel Architectures and Compilation Techniques (PACT'05).
[31] Peter Brezany,et al. SVM Support in the Vienna Fortran Compilation System , 1994 .
[32] Kemal Ebcioğlu,et al. X 10 : Programming for Hierarchical Parallelism and Non-Uniform Data Access ( Extended , 2004 .
[33] Ronald L. Rivest,et al. Introduction to Algorithms , 1990 .
[34] Vivek Sarkar,et al. X10: an object-oriented approach to non-uniform cluster computing , 2005, OOPSLA '05.
[35] Daisuke Takahashi,et al. The HPC Challenge (HPCC) benchmark suite , 2006, SC.
[36] Charles Koelbel,et al. Compiling Global Name-Space Parallel Loops for Distributed Execution , 1991, IEEE Trans. Parallel Distributed Syst..
[37] Katherine Yelick,et al. Optimizing partitioned global address space programs for cluster architectures , 2007 .
[38] Tarek A. El-Ghazawi,et al. UPC Performance and Potential: A NPB Experimental Study , 2002, ACM/IEEE SC 2002 Conference (SC'02).