Interprocedural Analysis for Loop Scheduling and Data Allocation
暂无分享,去创建一个
[1] Geoffrey C. Fox,et al. The Perfect Club Benchmarks: Effective Performance Evaluation of Supercomputers , 1989, Int. J. High Perform. Comput. Appl..
[2] CONSTANTINE D. POLYCHRONOPOULOS,et al. Guided Self-Scheduling: A Practical Scheduling Scheme for Parallel Supercomputers , 1987, IEEE Transactions on Computers.
[3] Monica S. Lam,et al. Global optimizations for parallelism and locality on scalable parallel machines , 1993, PLDI '93.
[4] Jingke Li,et al. Index domain alignment: minimizing cost of cross-referencing between distributed arrays , 1990, [1990 Proceedings] The Third Symposium on the Frontiers of Massively Parallel Computation.
[5] Keith D. Cooper,et al. An experiment with inline substitution , 1991, Softw. Pract. Exp..
[6] P.-S. Tseng. A parallelizing compiler for distributed memory parallel computers , 1989, PLDI 1989.
[7] Ken Kennedy,et al. Procedure cloning , 1992, Proceedings of the 1992 International Conference on Computer Languages.
[8] Anne M. Holler. A Study of the Effects of Subprogram Inlining , 1991 .
[9] Keshav Pingali,et al. Access normalization: loop restructuring for NUMA computers , 1993, TOCS.
[10] P. Sadayappan,et al. Communication-Free Hyperplane Partitioning of Nested Loops , 1993, J. Parallel Distributed Comput..
[11] P. Sadayappan,et al. Communication-Free Hyperplane Partitioning of Nested Loops , 1991, LCPC.
[12] Jack E. Veenstra,et al. Mint Tutorial and User Manual , 1993 .
[13] Vivek Sarkar,et al. Optimization of array accesses by collective loop transformations , 1991, ICS '91.
[14] Trung N. Nguyen,et al. Interprocedural compiler analysis for reducing memory latency , 1996 .
[15] Anant Agarwal,et al. Automatic Partitioning of Parallel Loops and Data Arrays for Distributed Shared-Memory Multiprocessors , 1995, IEEE Trans. Parallel Distributed Syst..
[16] Pierre Jouvelot,et al. Semantical interprocedural parallelization: an overview of the PIPS project , 1991 .
[17] Richard E. Kessler,et al. Page placement algorithms for large real-indexed caches , 1992, TOCS.
[18] William H. Press,et al. Numerical recipes : the art of scientific computing : FORTRAN version , 1989 .
[19] K. Kennedy,et al. Automatic Data Layout for High Performance Fortran , 1995, Proceedings of the IEEE/ACM SC95 Conference.
[20] Keith D. Cooper,et al. Unexpected side effects of inline substitution: a case study , 1992, LOPL.
[21] Arogyaswami Paulraj,et al. Loop partitioning for distributed memory multiprocessors as unimodular transformations , 1991, ICS '91.
[22] Henry G. Dietz,et al. Reduction of Cache Coherence Overhead by Compiler Data Layout and Loop Transformation , 1991, LCPC.
[23] Zhiyuan Li,et al. Experience with efficient array data flow analysis for array privatization , 1997, PPOPP '97.
[24] Monica S. Lam,et al. Maximizing parallelism and minimizing synchronization with affine transforms , 1997, POPL '97.
[25] Zhiyuan Li,et al. An Empirical Study of the Workload Distribution under Static Scheduling , 1994, 1994 Internatonal Conference on Parallel Processing Vol. 2.
[26] Monica S. Lam,et al. Data and computation transformations for multiprocessors , 1995, PPOPP '95.
[27] Monica S. Lam,et al. An Overview of a Compiler for Scalable Parallel Machines , 1993, LCPC.
[28] Marina C. Chen,et al. The Data Alignment Phase in Compiling Programs for Distrubuted-Memory Machines , 1991, J. Parallel Distributed Comput..