A scalable method for run-time loop parallelization
暂无分享,去创建一个
[1] Steven J. Plimpton,et al. Massively parallel methods for engineering and science problems , 1994, CACM.
[2] P. Sadayappan,et al. An approach to synchronization for parallel computing , 1988, ICS '88.
[3] B J Smith,et al. A pipelined, shared resource MIMD computer , 1986 .
[4] Iain S. Duff,et al. MA28 --- A set of Fortran subroutines for sparse unsymmetric linear equations , 1980 .
[5] David A. Padua,et al. Compiler Algorithms for Synchronization , 1987, IEEE Transactions on Computers.
[6] Panagiotis Takis Metaxas. Parallel algorithms for graph problems , 1992 .
[7] J. E. Thornton. Design of a Computer: The Control Data 6600 , 1970 .
[8] David A. Padua,et al. Automatic Array Privatization , 1993, Compiler Optimizations for Scalable Parallel Systems Languages.
[9] Lawrence Rauchwerger,et al. Parallelizing while loops for multiprocessor systems , 1995, Proceedings of 9th International Parallel Processing Symposium.
[10] Zhiyuan Li. Array privatization for parallel execution of loops , 1992, ICS.
[11] Nancy M. Amato,et al. Run-time methods for parallelizing partially parallel loops , 1995, ICS '95.
[12] Rudolf Eigenmann,et al. Performance Analysis of Parallelizing Compilers on the Perfect Benchmarks Programs , 1992, IEEE Trans. Parallel Distributed Syst..
[13] R. M. Tomasulo,et al. An efficient algorithm for exploiting multiple arithmetic units , 1995 .
[14] Sajal K. Das,et al. Book Review: Introduction to Parallel Algorithms and Architectures : Arrays, Trees, Hypercubes by F. T. Leighton (Morgan Kauffman Pub, 1992) , 1992, SIGA.
[15] Jay Hoeflinger,et al. Cedar Fortran and other vector and parallel Fortran dialects , 1988, Proceedings. SUPERCOMPUTING '88.
[16] Lawrence Rauchwerger,et al. The privatizing DOALL test: a run-time technique for DOALL loop identification and array privatization , 1994, ICS '94.
[17] Utpal Banerjee,et al. Dependence analysis for supercomputing , 1988, The Kluwer international series in engineering and computer science.
[18] Harry Berryman,et al. A manual for PARTI runtime primitives , 1990 .
[19] Pen-Chung Yew,et al. A Scheme to Enforce Data Dependence on Large Multiprocessor Systems , 1987, IEEE Trans. Software Eng..
[20] David A. Padua,et al. Dependence graphs and compiler optimizations , 1981, POPL '81.
[21] Peter M. Schwarz,et al. Experience Using Multiprocessor Systems—A Status Report , 1980, CSUR.
[22] Monica S. Lam,et al. Data Dependence and Data-Flow Analysis of Arrays , 1992, LCPC.
[23] Joel H. Saltz,et al. The Preprocessed Doacross Loop , 1991, ICPP.
[24] Joel H. Saltz,et al. Run-time parallelization and scheduling of loops , 1989, SPAA '89.
[25] Larry Rudolph,et al. Efficient parallel algorithms for graph problems , 1990, Algorithmica.
[26] Geoffrey C. Fox,et al. The Perfect Club Benchmarks: Effective Performance Evaluation of Supercomputers , 1989, Int. J. High Perform. Comput. Appl..
[27] Lawrence Rauchwerger,et al. The LRPD test: speculative run-time parallelization of loops with privatization and reduction parallelization , 1995, PLDI '95.
[28] Josep Torrellas,et al. An efficient algorithm for the run-time parallelization of DOACROSS loops , 1994, Proceedings of Supercomputing '94.
[29] Barbara M. Chapman,et al. Supercompilers for parallel and vector computers , 1990, ACM Press frontier series.
[30] Joel H. Saltz,et al. The doconsider loop , 1989, ICS '89.
[31] José E. Moreira,et al. Autoscheduling in a Distributed Shared-Memory Environment , 1994, LCPC.
[32] Harry Berryman,et al. Runtime Compilation Methods for Multicomputers , 1991, International Conference on Parallel Processing.
[33] David A. Padua,et al. Experience in the Automatic Parallelization of Four Perfect-Benchmark Programs , 1991, LCPC.
[34] Daniel Gajski,et al. CEDAR: a large scale multiprocessor , 1983, CARN.
[35] Wilson C. Hsieh,et al. Automatic generation of nested, fork-join parallelism , 2004, The Journal of Supercomputing.
[36] F. Leighton,et al. Introduction to Parallel Algorithms and Architectures: Arrays, Trees, Hypercubes , 1991 .
[37] David A. Padua,et al. Array privatization for shared and distributed memory machines (extended abstract) , 1993, SIGP.
[38] David A. Padua,et al. Advanced compiler optimizations for supercomputers , 1986, CACM.
[39] John Zahorjan,et al. Improving the performance of runtime parallelization , 1993, PPOPP '93.
[40] Constantine D. Polychronopoulos. Compiler Optimizations for Enhancing Parallelism and Their Impact on Architecture Design , 1988, IEEE Trans. Computers.