Software Support For Improving Locality in Scientific Codes
暂无分享,去创建一个
[1] Michael L. Scott,et al. False sharing and its effect on shared memory performance , 1993 .
[2] Chau-Wen Tseng,et al. Data transformations for eliminating conflict misses , 1998, PLDI.
[3] Toshio Nakatani,et al. Detection and global optimization of reduction operations for distributed parallel machines , 1996, ICS '96.
[4] Joel H. Saltz,et al. Runtime and language support for compiling adaptive irregular programs on distributed‐memory machines , 1995, Softw. Pract. Exp..
[5] Chau-Wen Tseng,et al. Improving compiler and run-time support for adaptive irregular codes , 1998, Proceedings. 1998 International Conference on Parallel Architectures and Compilation Techniques (Cat. No.98EX192).
[6] Chau-Wen Tseng,et al. Improving data locality with loop transformations , 1996, TOPL.
[7] Ulrich Rüde,et al. Performance Analysis and Optimization of Numerically Intensive Programs , 1992 .
[8] Vivek Sarkar,et al. A compiler framework for restructuring data declarations to enhance cache and TLB effectiveness , 1994, CASCON.
[9] Michael Wolfe,et al. More iteration space tiling , 1989, Proceedings of the 1989 ACM/IEEE Conference on Supercomputing (Supercomputing '89).
[10] James R. Larus,et al. Optimizing communication in HPF programs on fine-grain distributed shared memory , 1997, PPOPP '97.
[11] Sanjay Ranka,et al. Architecture-independent locality-improving transformations of computational graphs embedded in k-dimensions , 1995, ICS '95.
[12] Vivek Sarkar,et al. On Estimating and Enhancing Cache Effectiveness , 1991, LCPC.
[13] Shang-Hua Teng,et al. High performance Fortran for highly irregular problems , 1997, PPOPP '97.
[14] Ken Kennedy,et al. Compiler blockability of numerical algorithms , 1992, Proceedings Supercomputing '92.
[15] Sharad Malik,et al. Cache miss equations: an analytical representation of cache misses , 1997, ICS '97.
[16] M. Gerndt,et al. SUPERB support for irregular scientific computations , 1992, Proceedings Scalable High Performance Computing Conference SHPCC-92..
[17] Monica S. Lam,et al. Data and computation transformations for multiprocessors , 1995, PPOPP '95.
[18] Sharad Malik,et al. Precise miss analysis for program transformations with caches of arbitrary associativity , 1998, ASPLOS VIII.
[19] Ken Kennedy,et al. Optimizing for parallelism and data locality , 1992 .
[20] Ken Kennedy,et al. Improving memory hierarchy performance for irregular applications , 1999, ICS '99.
[21] Prithviraj Banerjee,et al. Exploiting spatial regularity in irregular iterative applications , 1995, Proceedings of 9th International Parallel Processing Symposium.
[22] Joel H. Saltz,et al. ICASE Report No . 92-12 / iVG / / ff 3 J / ICASE THE DESIGN AND IMPLEMENTATION OF A PARALLEL UNSTRUCTURED EULER SOLVER USING SOFTWARE PRIMITIVES , 2022 .
[23] Joel H. Saltz,et al. Communication Optimizations for Irregular Scientific Computations on Distributed Memory Architectures , 1994, J. Parallel Distributed Comput..
[24] Hui Li,et al. NUMACROS: data parallel programming on NUMA multiprocessors , 1993 .
[25] Prithviraj Banerjee,et al. Techniques to overlap computation and communication in irregular iterative applications , 1994, ICS '94.
[26] Kathryn S. McKinley,et al. Tile size selection using cache organization and data layout , 1995, PLDI '95.
[27] Olivier Temam,et al. Cache interference phenomena , 1994, SIGMETRICS.
[28] Skef Wholey. Automatic data mapping for distributed-memory parallel computers , 1992, ICS '92.
[29] Bo Lu,et al. Compiler optimization of implicit reductions for distributed memory multiprocessors , 1998, Proceedings of the First Merged International Parallel Processing Symposium and Symposium on Parallel and Distributed Processing.
[30] Sanjay Ranka,et al. Memory hierarchy management for iterative graph structures , 1998, Proceedings of the First Merged International Parallel Processing Symposium and Symposium on Parallel and Distributed Processing.
[31] Joel H. Saltz,et al. Dynamic Remapping of Parallel Computations with Varying Resource Demands , 1988, IEEE Trans. Computers.
[32] Zhiyuan Li,et al. New tiling techniques to improve cache temporal locality , 1999, PLDI '99.
[33] Mahmut T. Kandemir,et al. A compiler algorithm for optimizing locality in loop nests , 1997, ICS '97.
[34] Ken Kennedy,et al. Automatic Data Layout for High Performance Fortran , 1995, SC.
[35] Susan J. Eggers,et al. Reducing false sharing on shared memory multiprocessors through compile time data transformations , 1995, PPOPP '95.
[36] E. Cuthill,et al. Reducing the bandwidth of sparse symmetric matrices , 1969, ACM '69.
[37] Todd C. Mowry,et al. Compiler-directed page coloring for multiprocessors , 1996, ASPLOS VII.
[38] Josep Torrellas,et al. False Sharing ans Spatial Locality in Multiprocessor Caches , 1994, IEEE Trans. Computers.
[39] Prithviraj Banerjee,et al. Advanced compilation techniques in the PARADIGM compiler for distributed-memory multicomputers , 1995, ICS '95.
[40] Ken Kennedy,et al. GIVE-N-TAKE—a balanced code placement framework , 1994, PLDI '94.
[41] Harry A. G. Wijshoff,et al. Managing pages in shared virtual memory systems: getting the compiler into the game , 1993, ICS '93.
[42] Shashi Shekhar,et al. Partitioning Similarity Graphs: A Framework for Declustering Problems , 1996, Inf. Syst..
[43] Susan J. Eggers,et al. Eliminating False Sharing , 1991, ICPP.
[44] Andrew B. Kahng,et al. Recent directions in netlist partitioning , 1995 .
[45] Nenad Nedeljkovic,et al. Data distribution support on distributed shared memory multiprocessors , 1997, PLDI '97.
[46] Keshav Pingali,et al. Data-centric multi-level blocking , 1997, PLDI '97.
[47] Dennis Gannon,et al. Strategies for cache and local memory management by global program transformation , 1988, J. Parallel Distributed Comput..
[48] Vivek Sarkar,et al. Automatic selection of high-order transformations in the IBM XL FORTRAN compilers , 1997, IBM J. Res. Dev..
[49] von Hanxledenreinhard. D Newsletter #9 Handling Irregular Problems with Fortran D | a Preliminary Report Handling Irregular Problems with Fortran D | a Preliminary Report , 1993 .
[50] W. Jalby,et al. To copy or not to copy: a compile-time technique for assessing when data copying should be used to eliminate cache conflicts , 1993, Supercomputing '93.
[51] Chau-Wen Tseng,et al. A Comparison of Compiler Tiling Algorithms , 1999, CC.
[52] Alok N. Choudhary,et al. An efficient uniform run-time scheme for mixed regular-irregular applications , 1998, ICS '98.
[53] Margaret Martonosi,et al. Evaluating the impact of advanced memory systems on compiler-parallelized codes , 1995, PACT.
[54] K. Kennedy,et al. Preliminary experiences with the Fortran D compiler , 1993, Supercomputing '93.
[55] Marina C. Chen,et al. The Data Alignment Phase in Compiling Programs for Distrubuted-Memory Machines , 1991, J. Parallel Distributed Comput..
[56] James R. Larus,et al. Efficient support for irregular applications on distributed-memory machines , 1995, PPOPP '95.
[57] Martin C. Rinard,et al. Commutativity analysis: a new analysis technique for parallelizing compilers , 1997, TOPL.
[58] François Irigoin,et al. Supernode partitioning , 1988, POPL '88.
[59] Hermann Hellwagner,et al. Data Local Iterative Methods For The Efficient Solution of Partial Differential Equations , 1997 .
[60] William M. Pottenger,et al. The role of associativity and commutativity in the detection and transformation of loop-level parallelism , 1998, ICS '98.
[61] Monica S. Lam,et al. A data locality optimizing algorithm , 1991, PLDI '91.
[62] Wei Li,et al. Unifying data and control transformations for distributed shared-memory machines , 1995, PLDI '95.
[63] Vipin Kumar,et al. Analysis of Multilevel Graph Partitioning , 1995, Proceedings of the IEEE/ACM SC95 Conference.
[64] Chau-Wen Tseng,et al. Eliminating conflict misses for high performance architectures , 1998, ICS '98.
[65] Keshav Pingali,et al. An experimental evaluation of tiling and shackling for memory hierarchy management , 1999, ICS '99.
[66] Mahmut T. Kandemir,et al. Improving locality using loop and data transformations in an integrated framework , 1998, Proceedings. 31st Annual ACM/IEEE International Symposium on Microarchitecture.
[67] Joel H. Saltz,et al. Compiler and runtime support for irregularly coupled regular meshes , 1992, ICS '92.
[68] Alan L. Cox,et al. Compiler and software distributed shared memory support for irregular applications , 1997, PPOPP '97.
[69] G. Karypis,et al. Multilevel k-way hypergraph partitioning , 1999, Proceedings 1999 Design Automation Conference (Cat. No. 99CH36361).
[70] Tarek S. Abdelrahman,et al. Fusion of Loops for Parallelism and Locality , 1997, IEEE Trans. Parallel Distributed Syst..
[71] Monica S. Lam,et al. Global optimizations for parallelism and locality on scalable parallel machines , 1993, PLDI '93.
[72] Michael E. Wolf,et al. The cache performance and optimizations of blocked algorithms , 1991, ASPLOS IV.
[73] Vipin Kumar,et al. A Fast and High Quality Multilevel Scheme for Partitioning Irregular Graphs , 1998, SIAM J. Sci. Comput..