Improving Locality for Adaptive Irregular Scientific Codes
暂无分享,去创建一个
[1] Joel H. Saltz,et al. Run-Time Parallelization and Scheduling of Loops , 1991, IEEE Trans. Computers.
[2] Vipin Kumar,et al. A Fast and High Quality Multilevel Scheme for Partitioning Irregular Graphs , 1998, SIAM J. Sci. Comput..
[3] Larry Carter,et al. Localizing non-affine array references , 1999, 1999 International Conference on Parallel Architectures and Compilation Techniques (Cat. No.PR00425).
[4] Joel H. Saltz,et al. ICASE Report No . 92-12 / iVG / / ff 3 J / ICASE THE DESIGN AND IMPLEMENTATION OF A PARALLEL UNSTRUCTURED EULER SOLVER USING SOFTWARE PRIMITIVES , 2022 .
[5] Joel H. Saltz,et al. Communication Optimizations for Irregular Scientific Computations on Distributed Memory Architectures , 1994, J. Parallel Distributed Comput..
[6] Vipin Kumar,et al. Analysis of Multilevel Graph Partitioning , 1995, Proceedings of the IEEE/ACM SC95 Conference.
[7] Shang-Hua Teng,et al. High performance Fortran for highly irregular problems , 1997, PPOPP '97.
[8] Steven W. K. Tjiang,et al. SUIF: an infrastructure for research on parallelizing and optimizing compilers , 1994, SIGP.
[9] James R. Larus,et al. Cache-conscious structure definition , 1999, PLDI '99.
[10] Vivek Sarkar,et al. Automatic selection of high-order transformations in the IBM XL FORTRAN compilers , 1997, IBM J. Res. Dev..
[11] Sharad Malik,et al. Cache miss equations: an analytical representation of cache misses , 1997, ICS '97.
[12] von Hanxledenreinhard. D Newsletter #9 Handling Irregular Problems with Fortran D | a Preliminary Report Handling Irregular Problems with Fortran D | a Preliminary Report , 1993 .
[13] Joel H. Saltz,et al. Runtime and language support for compiling adaptive irregular programs on distributed‐memory machines , 1995, Softw. Pract. Exp..
[14] Chau-Wen Tseng,et al. Enhancing software DSM for compiler-parallelized applications , 1997, Proceedings 11th International Parallel Processing Symposium.
[15] James R. Larus,et al. Compiler-directed Shared-Memory Communication for Iterative Parallel Applications , 1996, Proceedings of the 1996 ACM/IEEE Conference on Supercomputing.
[16] Mary W. Hall,et al. Detecting Coarse - Grain Parallelism Using an Interprocedural Parallelizing Compiler , 1995, Proceedings of the IEEE/ACM SC95 Conference.
[17] Emilio L. Zapata,et al. A compiler method for the parallel execution of irregular reductions in scalable shared memory multiprocessors , 2000, ICS '00.
[18] A. H. Sherman,et al. Comparative Analysis of the Cuthill–McKee and the Reverse Cuthill–McKee Ordering Algorithms for Sparse Matrices , 1976 .
[19] Ken Kennedy,et al. GIVE-N-TAKE—a balanced code placement framework , 1994, PLDI '94.
[20] Alan L. Cox,et al. Compiler and software distributed shared memory support for irregular applications , 1997, PPOPP '97.
[21] J. Mark Bull,et al. Feedback Guided Dynamic Loop Scheduling: Algorithms and Experiments , 1998, Euro-Par.
[22] Alok N. Choudhary,et al. An efficient uniform run-time scheme for mixed regular-irregular applications , 1998, ICS '98.
[23] Bo Lu,et al. Compiler optimization of implicit reductions for distributed memory multiprocessors , 1998, Proceedings of the First Merged International Parallel Processing Symposium and Symposium on Parallel and Distributed Processing.
[24] Ken Kennedy,et al. Improving memory hierarchy performance for irregular applications , 1999, ICS '99.
[25] Prithviraj Banerjee,et al. Exploiting spatial regularity in irregular iterative applications , 1995, Proceedings of 9th International Parallel Processing Symposium.
[26] G. Karypis,et al. Multilevel k-way hypergraph partitioning , 1999, Proceedings 1999 Design Automation Conference (Cat. No. 99CH36361).
[27] Andrew B. Kahng,et al. Recent directions in netlist partitioning , 1995 .
[28] Toshio Nakatani,et al. Detection and global optimization of reduction operations for distributed parallel machines , 1996, ICS '96.
[29] Joel H. Saltz,et al. Run-time parallelization and scheduling of loops , 1989, SPAA '89.
[30] James R. Larus,et al. Optimizing communication in HPF programs on fine-grain distributed shared memory , 1997, PPOPP '97.
[31] Chau-Wen Tseng,et al. Improving compiler and run-time support for adaptive irregular codes , 1998, Proceedings. 1998 International Conference on Parallel Architectures and Compilation Techniques (Cat. No.98EX192).
[32] Keshav Pingali,et al. Proceedings of the tenth ACM SIGPLAN symposium on Principles and practice of parallel programming , 1997, PPoPP 1997.
[33] Shashi Shekhar,et al. Partitioning Similarity Graphs: A Framework for Declustering Problems , 1996, Inf. Syst..
[34] Shahid H. Bokhari,et al. A Partitioning Strategy for Nonuniform Problems on Multiprocessors , 1987, IEEE Transactions on Computers.
[35] Martin C. Rinard,et al. Commutativity analysis: a new analysis technique for parallelizing compilers , 1997, TOPL.
[36] Horst D. Simon,et al. Partitioning of unstructured problems for parallel processing , 1991 .
[37] Zhiyuan Li,et al. New tiling techniques to improve cache temporal locality , 1999, PLDI '99.
[38] Chau-Wen Tseng,et al. Data transformations for eliminating conflict misses , 1998, PLDI.
[39] E. Cuthill,et al. Reducing the bandwidth of sparse symmetric matrices , 1969, ACM '69.
[40] William M. Pottenger,et al. The role of associativity and commutativity in the detection and transformation of loop-level parallelism , 1998, ICS '98.
[41] Monica S. Lam,et al. A data locality optimizing algorithm , 1991, PLDI '91.
[42] Todd C. Mowry,et al. Memory forwarding: enabling aggressive layout optimizations by guaranteeing the safety of data relocation , 1999, ISCA.
[43] Shahid H. Bokhari,et al. A Partitioning Strategy for PDEs Across Multiprocessors , 1985, ICPP.
[44] Chau-Wen Tseng,et al. A Comparison of Locality Transformations for Irregular Codes , 2000, LCR.
[45] James R. Larus,et al. Cache-conscious structure layout , 1999, PLDI '99.
[46] Alan L. Cox,et al. An integrated compile-time/run-time software distributed shared memory system , 1996, ASPLOS VII.
[47] Mahmut T. Kandemir,et al. Improving locality using loop and data transformations in an integrated framework , 1998, Proceedings. 31st Annual ACM/IEEE International Symposium on Microarchitecture.
[48] Sanjay Ranka,et al. Memory hierarchy management for iterative graph structures , 1998, Proceedings of the First Merged International Parallel Processing Symposium and Symposium on Parallel and Distributed Processing.
[49] Joel H. Saltz,et al. Dynamic Remapping of Parallel Computations with Varying Resource Demands , 1988, IEEE Trans. Computers.
[50] Joel H. Saltz,et al. Principles of runtime support for parallel processors , 1988, ICS '88.
[51] Ken Kennedy,et al. Improving cache performance in dynamic applications through data and computation reorganization at run time , 1999, PLDI '99.
[52] Harry Berryman,et al. Parallel Loops on Distributed Machines , 1990, Proceedings of the Fifth Distributed Memory Computing Conference, 1990..
[53] Alan L. Cox,et al. Evaluating the performance of software distributed shared memory as a target for parallelizing compilers , 1997, Proceedings 11th International Parallel Processing Symposium.
[54] Chau-Wen Tseng,et al. Improving data locality with loop transformations , 1996, TOPL.
[55] Ken Kennedy,et al. Compiling Fortran D for MIMD distributed-memory machines , 1992, CACM.
[56] David A. Padua,et al. On the Automatic Parallelization of Sparse and Irregular Fortran Programs , 1998, LCR.
[57] David A. Padua,et al. Compiler analysis of irregular memory accesses , 2000, PLDI '00.
[58] James R. Larus,et al. Efficient support for irregular applications on distributed-memory machines , 1995, PPOPP '95.
[59] Ken Kennedy,et al. Inter-array Data Regrouping , 1999, LCPC.
[60] K. Kennedy,et al. Automatic Data Layout for High Performance Fortran , 1995, Proceedings of the IEEE/ACM SC95 Conference.
[61] Erik Brunvand,et al. Impulse: building a smarter memory controller , 1999, Proceedings Fifth International Symposium on High-Performance Computer Architecture.
[62] K. Kennedy,et al. Preliminary experiences with the Fortran D compiler , 1993, Supercomputing '93.