Interprocedural Analysis for Loop Scheduling and Data Allocation

Abstract In order to reduce remote memory accesses on CC-NUMA multiprocessors, we present an interprocedural analysis to support static loop scheduling and data allocation. Given a parallelized program, the compiler constructs graphs which represent globally and interprocedurally the remote reference penalties associated with different choices for loop scheduling and data allocation. After deriving an optimal solution according to those graphs, the compiler generates data allocation directives and schedules DOALL loops. Experiments indicate that the proposed compiler scheme is efficient and simulation results show good performance of the parallel code.

[1]  Geoffrey C. Fox,et al.  The Perfect Club Benchmarks: Effective Performance Evaluation of Supercomputers , 1989, Int. J. High Perform. Comput. Appl..

[2]  CONSTANTINE D. POLYCHRONOPOULOS,et al.  Guided Self-Scheduling: A Practical Scheduling Scheme for Parallel Supercomputers , 1987, IEEE Transactions on Computers.

[3]  Monica S. Lam,et al.  Global optimizations for parallelism and locality on scalable parallel machines , 1993, PLDI '93.

[4]  Jingke Li,et al.  Index domain alignment: minimizing cost of cross-referencing between distributed arrays , 1990, [1990 Proceedings] The Third Symposium on the Frontiers of Massively Parallel Computation.

[5]  Keith D. Cooper,et al.  An experiment with inline substitution , 1991, Softw. Pract. Exp..

[6]  P.-S. Tseng A parallelizing compiler for distributed memory parallel computers , 1989, PLDI 1989.

[7]  Ken Kennedy,et al.  Procedure cloning , 1992, Proceedings of the 1992 International Conference on Computer Languages.

[8]  Anne M. Holler A Study of the Effects of Subprogram Inlining , 1991 .

[9]  Keshav Pingali,et al.  Access normalization: loop restructuring for NUMA computers , 1993, TOCS.

[10]  P. Sadayappan,et al.  Communication-Free Hyperplane Partitioning of Nested Loops , 1993, J. Parallel Distributed Comput..

[11]  P. Sadayappan,et al.  Communication-Free Hyperplane Partitioning of Nested Loops , 1991, LCPC.

[12]  Jack E. Veenstra,et al.  Mint Tutorial and User Manual , 1993 .

[13]  Vivek Sarkar,et al.  Optimization of array accesses by collective loop transformations , 1991, ICS '91.

[14]  Trung N. Nguyen,et al.  Interprocedural compiler analysis for reducing memory latency , 1996 .

[15]  Anant Agarwal,et al.  Automatic Partitioning of Parallel Loops and Data Arrays for Distributed Shared-Memory Multiprocessors , 1995, IEEE Trans. Parallel Distributed Syst..

[16]  Pierre Jouvelot,et al.  Semantical interprocedural parallelization: an overview of the PIPS project , 1991 .

[17]  Richard E. Kessler,et al.  Page placement algorithms for large real-indexed caches , 1992, TOCS.

[18]  William H. Press,et al.  Numerical recipes : the art of scientific computing : FORTRAN version , 1989 .

[19]  K. Kennedy,et al.  Automatic Data Layout for High Performance Fortran , 1995, Proceedings of the IEEE/ACM SC95 Conference.

[20]  Keith D. Cooper,et al.  Unexpected side effects of inline substitution: a case study , 1992, LOPL.

[21]  Arogyaswami Paulraj,et al.  Loop partitioning for distributed memory multiprocessors as unimodular transformations , 1991, ICS '91.

[22]  Henry G. Dietz,et al.  Reduction of Cache Coherence Overhead by Compiler Data Layout and Loop Transformation , 1991, LCPC.

[23]  Zhiyuan Li,et al.  Experience with efficient array data flow analysis for array privatization , 1997, PPOPP '97.

[24]  Monica S. Lam,et al.  Maximizing parallelism and minimizing synchronization with affine transforms , 1997, POPL '97.

[25]  Zhiyuan Li,et al.  An Empirical Study of the Workload Distribution under Static Scheduling , 1994, 1994 Internatonal Conference on Parallel Processing Vol. 2.

[26]  Monica S. Lam,et al.  Data and computation transformations for multiprocessors , 1995, PPOPP '95.

[27]  Monica S. Lam,et al.  An Overview of a Compiler for Scalable Parallel Machines , 1993, LCPC.

[28]  Marina C. Chen,et al.  The Data Alignment Phase in Compiling Programs for Distrubuted-Memory Machines , 1991, J. Parallel Distributed Comput..