Achieving Scalable Locality with Time Skewing
暂无分享,去创建一个
[1] David W. Binkley,et al. Program slicing , 2008, 2008 Frontiers of Software Maintenance.
[2] Monica S. Lam,et al. A data locality optimizing algorithm , 1991, PLDI '91.
[3] David G. Wonnacott,et al. Using time skewing to eliminate idle time due to memory bandwidth and network limitations , 2000, Proceedings 14th International Parallel and Distributed Processing Symposium. IPDPS 2000.
[4] G. Roth,et al. Compiling Stencils in High Performance Fortran , 1997, ACM/IEEE SC 1997 Conference (SC'97).
[5] William W. Pugh,et al. Fine-grained analysis of array computations , 1998 .
[6] Zhiyuan Li,et al. New tiling techniques to improve cache temporal locality , 1999, PLDI '99.
[7] Ken Kennedy,et al. Transforming loops to recursion for multi-level memory hierarchies , 2000, PLDI '00.
[8] William Pugh,et al. The Omega Library interface guide , 1995 .
[9] Emmett Witchel,et al. Techniques for Increasing and Detecting Memory Alignment , 2001 .
[10] William Pugh,et al. Counting solutions to Presburger formulas: how and why , 1994, PLDI '94.
[11] Monica S. Lam,et al. The cache performance and optimizations of blocked algorithms , 1991, ASPLOS IV.
[12] Vivek Sarkar,et al. Baring It All to Software: Raw Machines , 1997, Computer.
[13] Anoop Gupta,et al. Design and evaluation of a compiler algorithm for prefetching , 1992, ASPLOS V.
[14] François Irigoin,et al. Supernode partitioning , 1988, POPL '88.
[15] David A. Padua,et al. Experience in the Automatic Parallelization of Four Perfect-Benchmark Programs , 1991, LCPC.
[16] Chau-Wen Tseng,et al. Improving data locality with loop transformations , 1996, TOPL.
[17] David A. Padua,et al. On the Automatic Parallelization of the Perfect Benchmarks , 1998, IEEE Trans. Parallel Distributed Syst..
[18] Dennis Gannon,et al. Strategies for cache and local memory management by global program transformation , 1988, J. Parallel Distributed Comput..
[19] Michael E. Wolf,et al. Improving locality and parallelism in nested loops , 1992 .
[20] William Pugh,et al. An Exact Method for Analysis of Value-based Array Data Dependences , 1993, LCPC.
[21] Michael E. Wolf,et al. The cache performance and optimizations of blocked algorithms , 1991, ASPLOS IV.
[22] Ken Kennedy,et al. Estimating Interlock and Improving Balance for Pipelined Architectures , 1988, J. Parallel Distributed Comput..
[23] Cheng Wang,et al. Data locality enhancement by memory reduction , 2001, ICS '01.
[24] William Pugh,et al. Eliminating false data dependences using the Omega test , 1992, PLDI '92.
[25] John D. McCalpin,et al. Time Skewing: A Value-Based Approach to Optimizing for Memory Locality , 1999 .
[26] William Pugh,et al. Determining schedules based on performance estimation , 1993 .
[27] David G. Wonnacott,et al. Time Skewing for Parallel Computers , 1999, LCPC.
[28] David G. Wonnacott. Extending Scalar Optimizations for Arrays , 2000, LCPC.
[29] William Pugh,et al. Constraint-based array dependence analysis , 1998, TOPL.
[30] Robert Sedgewick,et al. Algorithms in C , 1990 .
[31] W. Kelly,et al. Code generation for multiple mappings , 1995, Proceedings Frontiers '95. The Fifth Symposium on the Frontiers of Massively Parallel Computation.
[32] Nenad Nedeljkovic,et al. Data distribution support on distributed shared memory multiprocessors , 1997, PLDI '97.
[33] W. Jalby,et al. To copy or not to copy: a compile-time technique for assessing when data copying should be used to eliminate cache conflicts , 1993, Supercomputing '93.