论文信息 - Interprocedural Load Elimination for Dynamic Optimization of Parallel Programs - 字舞流文

Interprocedural Load Elimination for Dynamic Optimization of Parallel Programs

Load elimination is a classical compiler transformation that is increasing in importance for multi-core and many-core architectures. The effect of the transformation is to replace a memory access, such as a read of an object field or an array element, by a read of a compiler-generated temporary that can be allocated in faster and more energy-efficient storage structures such as registers and local memories (scratchpads). Unfortunately, current just-in-time and dynamic compilers perform load elimination only in limited situations. In particular, they usually make worst-case assumptions about potential side effects arising from parallel constructs and method calls. These two constraints interact with each other since parallel constructs are usually translated to low-level runtime library calls. In this paper, we introduce an interprocedural load elimination algorithm suitable for use in dynamic optimization of parallel programs. The main contributions of the paper include: a) an algorithm for load elimination in the presence of three core parallel constructs -- async, finish, and isolated, b) efficient side-effect analysis for method calls, c) extended side-effect analysis for parallel constructs using an Isolation Consistency memory model, and d) performance results to study the impact of load elimination on a set of standard benchmarks using an implementation of the algorithm in Jikes RVM for optimizing programs written in a subset of the X10 v1.5 language. Our performance results show decreases in dynamic counts for getfield operations of up to 99.99%, and performance improvements of up to 1.76x on 1 core, and 1.39x on 16 cores, when comparing the algorithm in this paper with the load elimination algorithm available in Jikes RVM.

Vivek Sarkar | Rajkishore Barik | R. Barik | Vivek Sarkar

[1] Thomas R. Gross,et al. Load Elimination in the Presence of Side Effects, Concurrency and Precise Exceptions , 2003, LCPC.

[2] Dennis Shasha,et al. Efficient and correct execution of parallel programs that share memory , 1988, TOPL.

[3] Thomas C. Spillman,et al. Exposing Side-Effects in a PL/I Optimizing Compiler , 1971, IFIP Congress.

[4] Vivek Sarkar,et al. Phasers: a unified deadlock-free construct for collective and point-to-point synchronization , 2008, ICS '08.

[5] Ondrej Lhoták,et al. Using Inter-Procedural Side-Effect Information in JIT Optimizations , 2005, CC.

[6] Raymond Lo,et al. Register promotion by sparse partial redundancy elimination of loads and stores , 1998, PLDI.

[7] Milo M. K. Martin,et al. Subtleties of transactional memory atomicity semantics , 2006, IEEE Computer Architecture Letters.

[8] Vivek Sarkar,et al. X10: an object-oriented approach to non-uniform cluster computing , 2005, OOPSLA '05.

[9] Ken Kennedy,et al. Fast interprocedual alias analysis , 1989, POPL '89.

[10] Ken Kennedy,et al. Optimizing Compilers for Modern Architectures: A Dependence-based Approach , 2001 .

[11] Ken Kennedy,et al. Scalar replacement in the presence of conditional control flow , 1994, Softw. Pract. Exp..

[12] Sarita V. Adve,et al. Recent advances in memory consistency models for hardware shared memory systems , 1999, Proc. IEEE.

[13] James R. Larus,et al. Transactional Memory , 2006, Transactional Memory.

[14] Ken Kennedy,et al. Improving register allocation for subscripted variables , 1990, SIGP.

[15] Lars Ræder Clausen. A Java Bytecode Optimizer Using Side-Effect Analysis , 1997, Concurr. Pract. Exp..

[16] Experiences with an SMP Implementation for X 10 based on the Java Concurrency Utilities ( Extended Abstract ) , 2008 .

[17] Vivek Sarkar,et al. Unified Analysis of Array and Object References in Strongly Typed Languages , 2000, SAS.

[18] Barbara G. Ryder,et al. Interprocedural modification side effect analysis with pointer aliasing , 1993, PLDI '93.

[19] Rajiv Gupta,et al. Load-reuse analysis: design and evaluation , 1999, PLDI '99.

[20] Vivek Sarkar,et al. Location Consistency-A New Memory Model and Cache Consistency Protocol , 2000, IEEE Trans. Computers.

[21] Leslie Lamport,et al. How to Make a Multiprocessor Computer That Correctly Executes Multiprocess Programs , 2016, IEEE Transactions on Computers.

[22] Thomas R. Gross,et al. Static conflict analysis for multi-threaded object-oriented programs , 2003, PLDI '03.

[23] Bronis R. de Supinski,et al. The OpenMP Memory Model , 2005, IWOMP.

[24] Frances E. Allen,et al. Interprocedural Data Flow Analysis , 1974, IFIP Congress.

[25] Vivek Sarkar,et al. May-happen-in-parallel analysis of X10 programs , 2007, PPoPP.

[26] Jeremy Manson,et al. The Java memory model , 2005, POPL '05.

[27] John Banning,et al. : An Efficient , 2022 .