Smartapps, an application centric approach to high performance computing: compiler-assisted software and hardware support for reduction operations
暂无分享,去创建一个
Nancy M. Amato | Josep Torrellas | Lawrence Rauchwerger | Milos Prvulovic | Hao Yu | Ye Zhang | María Jesús Garzarán | Alin Jula | Francis H. Dang
[1] A G WijshoffHarry,et al. A quantitative comparison of parallel computation models , 1998 .
[2] Anant Agarwal,et al. Evaluating the performance of software cache coherence , 1989, ASPLOS III.
[3] Geppino Pucci,et al. A Cost Model for Communication on a Symmetric MultiProcessor , 1998 .
[4] Guy E. Blelloch,et al. Accounting for memory bank contention and delay in high-bandwidth multiprocessors , 1995, SPAA '95.
[5] Nancy M. Amato,et al. Predicting performance on SMPs. A case study: the SGI Power Challenge , 2000, Proceedings 14th International Parallel and Distributed Processing Symposium. IPDPS 2000.
[6] Elizabeth Shriver,et al. Attribute-managed storage , 1995 .
[7] Markus Mock,et al. A retrospective on: "an evaluation of staged run-time optimizations in DyC" , 2004, SIGP.
[8] Mark Horowitz,et al. Modeling the performance of limited pointers directories for cache coherence , 1991, ISCA '91.
[9] Anoop Gupta,et al. The Stanford Dash multiprocessor , 1992, Computer.
[10] Josep Torrellas,et al. Speculative Parallel Execution of Loops with Cross-Iteration Dependences in DSM Multiprocessors , 1997, HPCA 1997.
[11] Michael Stumm,et al. Tornado: maximizing locality and concurrency in a shared memory multiprocessor operating system , 1999, OSDI '99.
[12] Elizabeth Shriver. Performance modeling for realistic storage devices , 1997 .
[13] Leslie G. Valiant,et al. A bridging model for parallel computation , 1990, CACM.
[14] Lawrence Rauchwerger,et al. Speculative Parallelization of Partially Parallel Loops , 2000, LCR.
[15] Nancy M. Amato,et al. Comparing the memory system performance of the HP V-class and SGI Origin 2000 multiprocessors using microbenchmarks and scientific applications , 1999, ICS '99.
[16] Robert S. Schreiber,et al. Hpf-2 scope of activities and motivating applications , 1994 .
[17] Chau-Wen Tseng,et al. Improving compiler and run-time support for adaptive irregular codes , 1998, Proceedings. 1998 International Conference on Parallel Architectures and Compilation Techniques (Cat. No.98EX192).
[18] Arif Merchant,et al. An analytic behavior model for disk drives with readahead caches and request reordering , 1998, SIGMETRICS '98/PERFORMANCE '98.
[19] Nancy M. Amato,et al. Run-time methods for parallelizing partially parallel loops , 1995, ICS '95.
[20] Lawrence Rauchwerger,et al. The R-LRPD test: speculative parallelization of partially parallel loops , 2002, Proceedings 16th International Parallel and Distributed Processing Symposium.
[21] Josep Torrellas,et al. Analysis of Critical Architectural and Program Parameters in a Hierarchical Shared Memory Multiprocessor , 1990, SIGMETRICS.
[22] Hubertus Franke,et al. Customization Lite , 1997 .
[23] Robert J. Fowler,et al. MINT: a front end for efficient simulation of shared-memory multiprocessors , 1994, Proceedings of International Workshop on Modeling, Analysis and Simulation of Computer and Telecommunication Systems.
[24] Andrew Rau-Chaplin,et al. Scalable parallel geometric algorithms for coarse grained multicomputers , 1993, SCG '93.
[25] John L. Henning. SPEC CPU2000: Measuring CPU Performance in the New Millennium , 2000, Computer.
[26] Nancy M. Amato,et al. Smartapps, an application centric approach to high performance computing: compiler-assisted software and hardware support for reduction operations , 2000, Proceedings 16th International Parallel and Distributed Processing Symposium.
[27] Yunheung Paek,et al. Parallel Programming with Polaris , 1996, Computer.
[28] Ramesh Subramonian,et al. LogP: towards a realistic model of parallel computation , 1993, PPOPP '93.
[29] Jack J. Dongarra,et al. A set of level 3 basic linear algebra subprograms , 1990, TOMS.
[30] Yossi Matias,et al. Can shared-memory model serve as a bridging model for parallel computation? , 1997, SPAA '97.
[31] Lawrence Rauchwerger,et al. The LRPD test: speculative run-time parallelization of loops with privatization and reduction parallelization , 1995, PLDI '95.
[32] Lawrence Rauchwerger,et al. Parallelizing while loops for multiprocessor systems , 1995, Proceedings of 9th International Parallel Processing Symposium.
[33] Ben H. H. Juurlink,et al. A quantitative comparison of parallel computation models , 1996, SPAA '96.
[34] Josep Torrellas,et al. Hardware for speculative run-time parallelization in distributed shared-memory multiprocessors , 1998, Proceedings 1998 Fourth International Symposium on High-Performance Computer Architecture.
[35] Josep Torrellas,et al. A direct-execution framework for fast and accurate simulation of superscalar processors , 1998, Proceedings. 1998 International Conference on Parallel Architectures and Compilation Techniques (Cat. No.98EX192).
[36] Michael Stumm,et al. HFS: a performance-oriented flexible file system based on building-block compositions , 1996, IOPADS '96.
[37] Anoop Gupta,et al. The Stanford FLASH multiprocessor , 1994, ISCA '94.
[38] Dileep Bhandarkar,et al. Performance characterization of the Pentium Pro processor , 1997, Proceedings Third International Symposium on High-Performance Computer Architecture.
[39] Mary K. Vernon,et al. Comparison of hardware and software cache coherence schemes , 1991, ISCA '91.
[40] M. Karplus,et al. CHARMM: A program for macromolecular energy, minimization, and dynamics calculations , 1983 .
[41] Susan J. Eggers,et al. A case for runtime code generation , 1993 .
[42] Lawrence Rauchwerger,et al. Adaptive reduction parallelization techniques , 2000, ICS '00.
[43] Vikram S. Adve,et al. Comparison of hardware and software cache coherence schemes , 1991, [1991] Proceedings. The 18th Annual International Symposium on Computer Architecture.
[44] Josep Torrellas,et al. Architectural support for parallel reductions in scalable shared-memory multiprocessors , 2001, Proceedings 2001 International Conference on Parallel Architectures and Compilation Techniques.
[45] Lawrence Rauchwerger,et al. Run-time parallelization: A framework for parallel computation , 1995 .
[46] Erik Brunvand,et al. Impulse: building a smarter memory controller , 1999, Proceedings Fifth International Symposium on High-Performance Computer Architecture.
[47] Yunheung Paek,et al. Advanced Program Restructuring for High-Performance Computers with Polaris , 2000 .
[48] J. Mark Bull,et al. Feedback Guided Dynamic Loop Scheduling: Algorithms and Experiments , 1998, Euro-Par.