论文信息 - Software Support For Improving Locality in Scientific Codes

Software Support For Improving Locality in Scientific Codes

We propose to develop and evaluate software support for improving locality for advanced scientific applications. We will investigate compiler and run-time techniques needed to achieve high performance on both sequential and parallel machines. We will focus on two areas. First, iterative PDE solvers for 3D partial differential equations have poor locality because accesses to nearby elements in higher-level dimensions are spread far apart in memory. Careful tiling and padding can frequently recapture such reuse. Second, computations on adaptive meshes and sparse matrices experience many cache misses because they access data in an irregular manner. Data layout and access order can be rearranged according to mesh connections or geometric location to improve locality, with cost models used to guide frequency of transformations for adaptive computations.

[1] Michael L. Scott,et al. False sharing and its effect on shared memory performance , 1993 .

[2] Chau-Wen Tseng,et al. Data transformations for eliminating conflict misses , 1998, PLDI.

[3] Toshio Nakatani,et al. Detection and global optimization of reduction operations for distributed parallel machines , 1996, ICS '96.

[4] Joel H. Saltz,et al. Runtime and language support for compiling adaptive irregular programs on distributed‐memory machines , 1995, Softw. Pract. Exp..

[5] Chau-Wen Tseng,et al. Improving compiler and run-time support for adaptive irregular codes , 1998, Proceedings. 1998 International Conference on Parallel Architectures and Compilation Techniques (Cat. No.98EX192).

[6] Chau-Wen Tseng,et al. Improving data locality with loop transformations , 1996, TOPL.

[7] Ulrich Rüde,et al. Performance Analysis and Optimization of Numerically Intensive Programs , 1992 .

[8] Vivek Sarkar,et al. A compiler framework for restructuring data declarations to enhance cache and TLB effectiveness , 1994, CASCON.

[9] Michael Wolfe,et al. More iteration space tiling , 1989, Proceedings of the 1989 ACM/IEEE Conference on Supercomputing (Supercomputing '89).

[10] James R. Larus,et al. Optimizing communication in HPF programs on fine-grain distributed shared memory , 1997, PPOPP '97.

[11] Sanjay Ranka,et al. Architecture-independent locality-improving transformations of computational graphs embedded in k-dimensions , 1995, ICS '95.

[12] Vivek Sarkar,et al. On Estimating and Enhancing Cache Effectiveness , 1991, LCPC.

[13] Shang-Hua Teng,et al. High performance Fortran for highly irregular problems , 1997, PPOPP '97.

[14] Ken Kennedy,et al. Compiler blockability of numerical algorithms , 1992, Proceedings Supercomputing '92.

[15] Sharad Malik,et al. Cache miss equations: an analytical representation of cache misses , 1997, ICS '97.

[16] M. Gerndt,et al. SUPERB support for irregular scientific computations , 1992, Proceedings Scalable High Performance Computing Conference SHPCC-92..

[17] Monica S. Lam,et al. Data and computation transformations for multiprocessors , 1995, PPOPP '95.

[18] Sharad Malik,et al. Precise miss analysis for program transformations with caches of arbitrary associativity , 1998, ASPLOS VIII.

[19] Ken Kennedy,et al. Optimizing for parallelism and data locality , 1992 .

[20] Ken Kennedy,et al. Improving memory hierarchy performance for irregular applications , 1999, ICS '99.

[21] Prithviraj Banerjee,et al. Exploiting spatial regularity in irregular iterative applications , 1995, Proceedings of 9th International Parallel Processing Symposium.

[22] Joel H. Saltz,et al. ICASE Report No . 92-12 / iVG / / ff 3 J / ICASE THE DESIGN AND IMPLEMENTATION OF A PARALLEL UNSTRUCTURED EULER SOLVER USING SOFTWARE PRIMITIVES , 2022 .

[23] Joel H. Saltz,et al. Communication Optimizations for Irregular Scientific Computations on Distributed Memory Architectures , 1994, J. Parallel Distributed Comput..

[24] Hui Li,et al. NUMACROS: data parallel programming on NUMA multiprocessors , 1993 .

[25] Prithviraj Banerjee,et al. Techniques to overlap computation and communication in irregular iterative applications , 1994, ICS '94.

[26] Kathryn S. McKinley,et al. Tile size selection using cache organization and data layout , 1995, PLDI '95.

[27] Olivier Temam,et al. Cache interference phenomena , 1994, SIGMETRICS.

[28] Skef Wholey. Automatic data mapping for distributed-memory parallel computers , 1992, ICS '92.

[29] Bo Lu,et al. Compiler optimization of implicit reductions for distributed memory multiprocessors , 1998, Proceedings of the First Merged International Parallel Processing Symposium and Symposium on Parallel and Distributed Processing.

[30] Sanjay Ranka,et al. Memory hierarchy management for iterative graph structures , 1998, Proceedings of the First Merged International Parallel Processing Symposium and Symposium on Parallel and Distributed Processing.

[31] Joel H. Saltz,et al. Dynamic Remapping of Parallel Computations with Varying Resource Demands , 1988, IEEE Trans. Computers.

[32] Zhiyuan Li,et al. New tiling techniques to improve cache temporal locality , 1999, PLDI '99.

[33] Mahmut T. Kandemir,et al. A compiler algorithm for optimizing locality in loop nests , 1997, ICS '97.

[34] Ken Kennedy,et al. Automatic Data Layout for High Performance Fortran , 1995, SC.

[35] Susan J. Eggers,et al. Reducing false sharing on shared memory multiprocessors through compile time data transformations , 1995, PPOPP '95.

[36] E. Cuthill,et al. Reducing the bandwidth of sparse symmetric matrices , 1969, ACM '69.

[37] Todd C. Mowry,et al. Compiler-directed page coloring for multiprocessors , 1996, ASPLOS VII.

[38] Josep Torrellas,et al. False Sharing ans Spatial Locality in Multiprocessor Caches , 1994, IEEE Trans. Computers.

[39] Prithviraj Banerjee,et al. Advanced compilation techniques in the PARADIGM compiler for distributed-memory multicomputers , 1995, ICS '95.

[40] Ken Kennedy,et al. GIVE-N-TAKE—a balanced code placement framework , 1994, PLDI '94.

[41] Harry A. G. Wijshoff,et al. Managing pages in shared virtual memory systems: getting the compiler into the game , 1993, ICS '93.

[42] Shashi Shekhar,et al. Partitioning Similarity Graphs: A Framework for Declustering Problems , 1996, Inf. Syst..