Cache-aware partitioning of multi-dimensional iteration spaces
暂无分享,去创建一个
Alexander V. Veidenbaum | Alexandru Nicolau | Constantine D. Polychronopoulos | Arun Kejariwal | Utpal Banerjee | U. Banerjee | A. Nicolau | C. Polychronopoulos | A. Veidenbaum | A. Kejariwal
[1] Siddhartha Chatterjee,et al. Exact analysis of the cache behavior of nested loops , 2001, PLDI '01.
[2] Jang-Ping Sheu,et al. Partitioning and Mapping Nested Loops on Multiprocessor Systems , 1991, IEEE Trans. Parallel Distributed Syst..
[3] Constantine D. Polychronopoulos. Loop Coalesing: A Compiler Transformation for Parallel Machines , 1987, ICPP.
[4] Kunle Olukotun,et al. The Future of Microprocessors , 2005, ACM Queue.
[5] C. Jousselin,et al. An algebraic memory model , 1989, CARN.
[6] Nectarios Koziris,et al. Evaluation of loop grouping methods based on orthogonal projection spaces , 2000, Proceedings 2000 International Conference on Parallel Processing.
[7] Constantine D. Polychronopoulos,et al. Symbolic analysis for parallelizing compilers , 1996, TOPL.
[8] Monica S. Lam,et al. A data locality optimizing algorithm , 1991, PLDI '91.
[9] Andreas Krall,et al. Improving semi-static branch prediction by code replication , 1994, PLDI '94.
[10] Michael E. Wolf,et al. Combining Loop Transformations Considering Caches and Scheduling , 2004, International Journal of Parallel Programming.
[11] Michael Wolfe,et al. High performance compilers for parallel computing , 1995 .
[12] G. H. Barnes,et al. A controllable MIMD architecture , 1986 .
[13] James R. Larus,et al. Exploiting hardware performance counters with flow and context sensitive profiling , 1997, PLDI '97.
[14] Michael Wolfe,et al. Iteration Space Tiling for Memory Hierarchies , 1987, PPSC.
[15] Josep Llosa,et al. Optimizing cache miss equations polyhedra , 2000, CARN.
[16] Uri C. Weiser,et al. Nahalal: Cache Organization for Chip Multiprocessors , 2007, IEEE Computer Architecture Letters.
[17] David A. Padua,et al. Advanced compiler optimizations for supercomputers , 1986, CACM.
[18] Thomas R. Gross,et al. Using Platform-Specific Performance Counters for Dynamic Compilation , 2005, LCPC.
[19] Alexandru Nicolau,et al. A Geometric Approach for Partitioning N-Dimensional Non-rectangular Iteration Spaces , 2004, LCPC.
[20] Alexander V. Veidenbaum,et al. EFFECTS OF PROGRAM RESTRUCTURING, ALGORITHM CHANGE, AND ARCHITECTURE CHOICE ON PROGRAM PERFORMANCE. , 1984 .
[21] Michael O'Boyle,et al. Program and data transformations for efficient execution on distributed memory architectures , 1993, Technical report series.
[22] H. V. Jagadish,et al. An intelligent memory system , 1988, CARN.
[23] James R. Larus,et al. Branch prediction for free , 1993, PLDI '93.
[24] Graham R. Nudd,et al. Analytical Modeling of Set-Associative Cache Behavior , 1999, IEEE Trans. Computers.
[25] Emilio L. Zapata,et al. A compiler tool to predict memory hierarchy performance of scientific codes , 2004, Parallel Comput..
[26] Alexandru Nicolau,et al. A novel approach for partitioning iteration spaces with variable densities , 2005, PPoPP.
[27] Vivek Sarkar,et al. Parallel Program Graphs and their Classification , 1993, LCPC.
[28] David A. Patterson,et al. Computer Architecture: A Quantitative Approach , 1969 .
[29] Monica S. Lam,et al. A Loop Transformation Theory and an Algorithm to Maximize Parallelism , 1991, IEEE Trans. Parallel Distributed Syst..
[30] Vivek Sarkar,et al. On Estimating and Enhancing Cache Effectiveness , 1991, LCPC.
[31] Sharad Malik,et al. Cache miss equations: an analytical representation of cache misses , 1997, ICS '97.
[32] Zhiyuan Li. Array privatization for parallel execution of loops , 1992, ICS.
[33] Rizos Sakellariou,et al. On the Quest for Perfect Load Balance in Loop-Based Parallel Computations , 1996 .
[34] Erik H. D'Hollander,et al. Partitioning and Labeling of Loops by Unimodular Transformations , 1992, IEEE Trans. Parallel Distributed Syst..
[35] Jang-Ping Sheu,et al. Partitioning and mapping of nested loops for linear array multicomputers , 1995, The Journal of Supercomputing.
[36] James R. Larus,et al. Software and the Concurrency Revolution , 2005, ACM Queue.
[37] Milind Girkar,et al. A general approach for partitioning N-dimensional parallel nested loops with conditionals , 2006, SPAA '06.
[38] Thomas R. Gross,et al. Online optimizations driven by hardware performance monitoring , 2007, PLDI '07.
[39] David A. Padua,et al. Execution of Parallel Loops on Parallel Processor Systems , 1986, ICPP.
[40] Olivier Temam,et al. Cache interference phenomena , 1994, SIGMETRICS.
[41] Arogyaswami Paulraj,et al. Loop partitioning for distributed memory multiprocessors as unimodular transformations , 1991, ICS '91.